(arrow) branch main updated: GH-41841: [R][CI] Remove more defunct rhub containers (#41828)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 8f3bf67cca GH-41841: [R][CI] Remove more defunct rhub containers (#41828) 8f3bf67cca is described below commit 8f3bf67cca32902e241b1857502247918861a3f8 Author: Jonathan Keane AuthorDate: Tue May 28 17:26:09 2024 -0500 GH-41841: [R][CI] Remove more defunct rhub containers (#41828) Testing CI to see if we can replicate the incoming NOTEs: ``` Found the following (possibly) invalid file URIs: URI: articles/read_write.html From: README.md URI: articles/data_wrangling.html From: README.md URI: reference/acero.html From: README.md URI: articles/install.html From: README.md URI: articles/install_nightly.html From: README.md ``` I wasn't able to replicate them in CI (even with `_R_CHECK_CRAN_INCOMING_REMOTE_` set to true, and installing pandoc so that the docs could be munged.) But in the process realized we were running old rhub images that aren't updated anymore (thanks, @ thisisnic). Also did a bit of cleanup of `--run-donttest` which is now no longer needed (was removed in favor of the env var in 4.0) * GitHub Issue: #41841 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- .github/workflows/r.yml | 5 ++-- ci/scripts/r_install_system_dependencies.sh | 43 +++-- ci/scripts/r_test.sh| 9 +++--- dev/tasks/r/github.linux.cran.yml | 9 +++--- r/Makefile | 4 +-- 5 files changed, 35 insertions(+), 35 deletions(-) diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index aba7734765..6bd940f806 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -370,11 +370,12 @@ jobs: MAKEFLAGS = paste0("-j", parallel::detectCores()), ARROW_R_DEV = TRUE, "_R_CHECK_FORCE_SUGGESTS_" = FALSE, -"_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_" = TRUE +"_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_" = TRUE, +"_R_CHECK_DONTTEST_EXAMPLES_" = TRUE ) rcmdcheck::rcmdcheck(".", build_args = '--no-build-vignettes', -args = c('--no-manual', '--as-cran', '--ignore-vignettes', '--run-donttest'), +args = c('--no-manual', '--as-cran', '--ignore-vignettes'), error_on = 'warning', check_dir = 'check', timeout = 3600 diff --git a/ci/scripts/r_install_system_dependencies.sh b/ci/scripts/r_install_system_dependencies.sh index be0d75ef23..7ddc2604f6 100755 --- a/ci/scripts/r_install_system_dependencies.sh +++ b/ci/scripts/r_install_system_dependencies.sh @@ -21,29 +21,30 @@ set -ex : ${ARROW_SOURCE_HOME:=/arrow} -if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_GCS" == "ON" ] || [ "$ARROW_R_DEV" == "TRUE" ]; then - # Figure out what package manager we have - if [ "`which dnf`" ]; then -PACKAGE_MANAGER=dnf - elif [ "`which yum`" ]; then -PACKAGE_MANAGER=yum - elif [ "`which zypper`" ]; then -PACKAGE_MANAGER=zypper - else -PACKAGE_MANAGER=apt-get -apt-get update - fi +# Figure out what package manager we have +if [ "`which dnf`" ]; then + PACKAGE_MANAGER=dnf +elif [ "`which yum`" ]; then + PACKAGE_MANAGER=yum +elif [ "`which zypper`" ]; then + PACKAGE_MANAGER=zypper +else + PACKAGE_MANAGER=apt-get + apt-get update +fi - # Install curl and OpenSSL for S3/GCS support - case "$PACKAGE_MANAGER" in -apt-get) - apt-get install -y libcurl4-openssl-dev libssl-dev - ;; -*) - $PACKAGE_MANAGER install -y libcurl-devel openssl-devel - ;; - esac +# Install curl and OpenSSL (technically, only needed for S3/GCS support, but +# installing the R curl package fails without it) +case "$PACKAGE_MANAGER" in + apt-get) +apt-get install -y libcurl4-openssl-dev libssl-dev +;; + *) +$PACKAGE_MANAGER install -y libcurl-devel openssl-devel +;; +esac +if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_GCS" == "ON" ] || [ "$ARROW_R_DEV" == "TRUE" ]; then # The Dockerfile should have put this file here if [ "$ARROW_S3" == "ON" ] && [ -f "${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh" ] && [ "`which wget`" ]; then "${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh" latest /usr/local diff --git a/ci/scripts/r_test.sh b/ci/scripts/r_test.sh index e13da45e
(arrow) annotated tag r-universe-release updated (f39b3f343a -> 705303e3d9)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to annotated tag r-universe-release in repository https://gitbox.apache.org/repos/asf/arrow.git *** WARNING: tag r-universe-release was modified! *** from f39b3f343a (commit) to 705303e3d9 (tag) tagging f39b3f343acc435333e6502b817e3be40ce54543 (commit) replaces apache-arrow-16.1.0 by Jonathan Keane on Sat May 25 17:34:07 2024 -0500 - Log - latest R package release on r-universe --- No new revisions were added by this update. Summary of changes:
(arrow) tag r-universe-release deleted (was ac9707663c)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to tag r-universe-release in repository https://gitbox.apache.org/repos/asf/arrow.git *** WARNING: tag r-universe-release was deleted! *** was ac9707663c Remove badges in README The revisions that were on this tag are still contained in other references; therefore, this change does not discard any commits from the repository.
(arrow) branch main updated: GH-41450: [R][CI] rhub/container follow ons (#41451)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 6d03215543 GH-41450: [R][CI] rhub/container follow ons (#41451) 6d03215543 is described below commit 6d0321554374523ae0633d6bfe42cdeeb3b5d145 Author: Jonathan Keane AuthorDate: Sun May 12 11:00:26 2024 -0400 GH-41450: [R][CI] rhub/container follow ons (#41451) More CI changes: * GitHub Issue: #41450 (specifically use the rhub containers approach for clang sanitizer, remove some of our work arounds) * Remove CentOS 7 CI support for R Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- .env | 3 --- .github/workflows/r.yml| 3 +-- ci/docker/linux-r.dockerfile | 3 --- ci/scripts/java_jni_manylinux_build.sh | 3 --- ci/scripts/r_docker_configure.sh | 20 ci/scripts/r_sanitize.sh | 2 ++ ci/scripts/r_test.sh | 3 --- dev/tasks/r/azure.linux.yml| 1 - dev/tasks/r/github.packages.yml| 7 +++ dev/tasks/tasks.yml| 13 ++--- docker-compose.yml | 16 ++-- r/tools/test-nixlibs.R | 4 r/tools/ubsan.supp | 1 + r/vignettes/install.Rmd| 33 - 14 files changed, 15 insertions(+), 97 deletions(-) diff --git a/.env b/.env index ab2e4b4fbe..27474b2c73 100644 --- a/.env +++ b/.env @@ -86,9 +86,6 @@ ARROW_R_DEV=TRUE R_PRUNE_DEPS=FALSE TZ=UTC -# Any non-empty string will install devtoolset-${DEVTOOLSET_VERSION} -DEVTOOLSET_VERSION= - # Used through docker-compose.yml and serves as the default version for the # ci/scripts/install_vcpkg.sh script. Prefer to use short SHAs to keep the # docker tags more readable. diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index 8228aaad7c..aba7734765 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -192,12 +192,11 @@ jobs: fail-fast: false matrix: config: - - { org: "rhub", image: "ubuntu-gcc12", tag: "latest", devtoolset: "" } + - { org: "rhub", image: "ubuntu-gcc12", tag: "latest" } env: R_ORG: ${{ matrix.config.org }} R_IMAGE: ${{ matrix.config.image }} R_TAG: ${{ matrix.config.tag }} - DEVTOOLSET_VERSION: ${{ matrix.config.devtoolset }} steps: - name: Checkout Arrow uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0 diff --git a/ci/docker/linux-r.dockerfile b/ci/docker/linux-r.dockerfile index d368a6629c..7b7e989adc 100644 --- a/ci/docker/linux-r.dockerfile +++ b/ci/docker/linux-r.dockerfile @@ -27,9 +27,6 @@ ENV R_BIN=${r_bin} ARG r_dev=FALSE ENV ARROW_R_DEV=${r_dev} -ARG devtoolset_version= -ENV DEVTOOLSET_VERSION=${devtoolset_version} - ARG r_prune_deps=FALSE ENV R_PRUNE_DEPS=${r_prune_deps} diff --git a/ci/scripts/java_jni_manylinux_build.sh b/ci/scripts/java_jni_manylinux_build.sh index da4987d307..4921ce170b 100755 --- a/ci/scripts/java_jni_manylinux_build.sh +++ b/ci/scripts/java_jni_manylinux_build.sh @@ -35,9 +35,6 @@ echo "=== Clear output directories and leftovers ===" rm -rf ${build_dir} echo "=== Building Arrow C++ libraries ===" -devtoolset_version=$(rpm -qa "devtoolset-*-gcc" --queryformat %{VERSION} | \ - grep -o "^[0-9]*") -devtoolset_include_cpp="/opt/rh/devtoolset-${devtoolset_version}/root/usr/include/c++/${devtoolset_version}" : ${ARROW_ACERO:=ON} export ARROW_ACERO : ${ARROW_BUILD_TESTS:=ON} diff --git a/ci/scripts/r_docker_configure.sh b/ci/scripts/r_docker_configure.sh index 52db2e6df6..8a962fe576 100755 --- a/ci/scripts/r_docker_configure.sh +++ b/ci/scripts/r_docker_configure.sh @@ -67,26 +67,6 @@ sloppiness = include_file_ctime hash_dir = false" >> ~/.ccache/ccache.conf fi -# Special hacking to try to reproduce quirks on centos using non-default build -# tooling. -if [[ -n "$DEVTOOLSET_VERSION" ]]; then - $PACKAGE_MANAGER install -y centos-release-scl - $PACKAGE_MANAGER install -y "devtoolset-$DEVTOOLSET_VERSION" - - # Enable devtoolset here so that `which gcc` finds the right compiler below - source /opt/rh/devtoolset-${DEVTOOLSET_VERSION}/enable - - # Build images which require the devtoolset don't have CXX17 variables - # set as the system compiler doesn't support C++17 - if [ ! "`{R_BIN} CMD config CXX17`" ]; then -mkdir -p ~/.R -echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars -echo "CXX17 = $(which g++) -fPIC" >> ~/.R/Makevars
(arrow) branch main updated: GH-41402: [CI][R] Update our backwards compatibility CI any other R 4.4 cleanups (#41403)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 6eb0b37386 GH-41402: [CI][R] Update our backwards compatibility CI any other R 4.4 cleanups (#41403) 6eb0b37386 is described below commit 6eb0b37386ecbfc4108e914d6dadb8b049a6f549 Author: Jonathan Keane AuthorDate: Mon Apr 29 08:39:07 2024 -0500 GH-41402: [CI][R] Update our backwards compatibility CI any other R 4.4 cleanups (#41403) ### Rationale for this change Keep up with the state of the world, ensure we are maintaining backwards compatibility. Resolves #41402 ### What changes are included in this PR? * Bump to 4.4 as the release * Remove old 3.6 jobs now that we no longer support that; clean up code where we hardcode things fro 3.6 and below * Move many of our CI jobs to [rhub's new containers](https://github.com/r-hub/containers). We were accidentally running stale R devel (from December 2023) because the other rhub images stopped being updated. (One exception to be done as a follow on: #41416) * Resolve a number of extended test failures With this PR R extended tests should be all green with the exceptions of: * Two sanitizer jobs (test-fedora-r-clang-sanitizer, test-ubuntu-r-sanitizer) — which are being investigated / fixed in #41421 * Valgrind — I'm running one last run with a new suppression file. * Binary jobs — these work but fail at upload, see https://github.com/apache/arrow/pull/41403#discussion_r1582245207 * Windows R Release — failing on main, #41398 ### Are these changes tested? By definition. ### Are there any user-facing changes? No. * GitHub Issue: #41402 Lead-authored-by: Jonathan Keane Co-authored-by: Jacob Wujciak-Jens Signed-off-by: Jonathan Keane --- .env | 6 +-- .github/workflows/r.yml| 4 +- ci/docker/linux-apt-docs.dockerfile| 2 +- ci/docker/linux-apt-lint.dockerfile| 2 +- ci/docker/linux-apt-r.dockerfile | 2 +- ci/etc/valgrind-cran.supp | 20 +++- ci/scripts/r_sanitize.sh | 4 +- ci/scripts/r_test.sh | 7 ++- ci/scripts/r_valgrind.sh | 2 +- .../r/github.linux.arrow.version.back.compat.yml | 2 + dev/tasks/r/github.linux.offline.build.yml | 2 +- dev/tasks/r/github.linux.versions.yml | 2 +- dev/tasks/r/github.packages.yml| 10 ++-- dev/tasks/tasks.yml| 12 ++--- docker-compose.yml | 5 +- r/DESCRIPTION | 2 +- r/R/dplyr-funcs-type.R | 2 +- r/R/util.R | 14 -- r/tests/testthat/test-Array.R | 5 -- r/tests/testthat/test-RecordBatch.R| 16 ++- r/tests/testthat/test-Table.R | 4 -- r/tests/testthat/test-altrep.R | 7 ++- r/tests/testthat/test-chunked-array.R | 5 -- r/tests/testthat/test-dplyr-collapse.R | 10 r/tests/testthat/test-dplyr-funcs-datetime.R | 32 +++-- r/tests/testthat/test-dplyr-funcs-type.R | 3 +- r/tests/testthat/test-dplyr-glimpse.R | 5 -- r/tests/testthat/test-scalar.R | 4 -- r/tools/test-nixlibs.R | 7 ++- r/vignettes/developers/docker.Rmd | 50 ++-- r/vignettes/install.Rmd| 55 ++ 31 files changed, 139 insertions(+), 164 deletions(-) diff --git a/.env b/.env index d9f875a4d4..ab2e4b4fbe 100644 --- a/.env +++ b/.env @@ -71,12 +71,12 @@ NUMBA=latest NUMPY=latest PANDAS=latest PYTHON=3.8 -R=4.2 +R=4.4 SPARK=master TURBODBC=latest -# These correspond to images on Docker Hub that contain R, e.g. rhub/ubuntu-gcc-release:latest -R_IMAGE=ubuntu-gcc-release +# These correspond to images on Docker Hub that contain R, e.g. rhub/ubuntu-release:latest +R_IMAGE=ubuntu-release R_ORG=rhub R_TAG=latest diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index 05c85fa6dc..8228aaad7c 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -121,7 +121,7 @@ jobs: strategy: fail-fast: false matrix: -r: ["4.3"] +r: ["4.4"] ubuntu: [20.04] force-tests: ["true"] env: @@ -192,7 +192,7 @@ jobs: fail-fast: false matrix: config: - - { org: "rhub", image: "
(arrow) branch main updated (be3baf2697 -> 3886cf1d43)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git from be3baf2697 GH-40680: [Java] Test JDK 22 in CI (#41038) add 3886cf1d43 GH-40991: [R] Prefer r-universe, add a startup message (#41019) No new revisions were added by this update. Summary of changes: r/DESCRIPTION | 2 +- r/R/arrow-info.R | 3 ++- r/R/arrow-package.R| 71 +++--- r/R/install-arrow.R| 14 +++--- r/man/arrow-package.Rd | 4 +-- r/man/format_schema.Rd | 18 + 6 files changed, 70 insertions(+), 42 deletions(-) create mode 100644 r/man/format_schema.Rd
(arrow) branch main updated (81c9d30be3 -> 640667664a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git from 81c9d30be3 GH-40155: [Go][FlightRPC][FlightSQL] Implement Session Management (#40284) add 640667664a GH-40323: [R] [CI] Use rocker/r-ver instead of library/r-base (#40321) No new revisions were added by this update. Summary of changes: dev/tasks/tasks.yml | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-)
(arrow) branch main updated: GH-40268: [Archery] Bump the version of pygit2, adapt to API changes (#40269)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 30e6d72242 GH-40268: [Archery] Bump the version of pygit2, adapt to API changes (#40269) 30e6d72242 is described below commit 30e6d72242e376baa598b2e8f1d9b80d800a974c Author: Jonathan Keane AuthorDate: Fri Mar 1 07:40:09 2024 -0600 GH-40268: [Archery] Bump the version of pygit2, adapt to API changes (#40269) ### Rationale for this change `archery crossbow submit ...` fails with newer versions of pygit2 ### What changes are included in this PR? Adapt away from deprecated [sic] APIs in pygit2 to ones that work with current versions, bump the pin ### Are these changes tested? Manually, yes, I can use `archery crossbow submit ...` again. CI will run using archery in a bunch of places on this PR too. ### Are there any user-facing changes? No * GitHub Issue: #40268 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- .github/workflows/archery.yml| 2 +- .github/workflows/comment_bot.yml| 2 +- .github/workflows/dev.yml| 4 ++-- .github/workflows/docs.yml | 2 +- .github/workflows/docs_light.yml | 2 +- .github/workflows/java_nightly.yml | 2 +- .github/workflows/pr_bot.yml | 2 +- .github/workflows/r_nightly.yml | 6 +++--- dev/archery/archery/crossbow/core.py | 2 +- dev/archery/archery/docker/cli.py| 2 +- dev/archery/setup.py | 6 +- dev/tasks/java-jars/github.yml | 2 +- dev/tasks/macros.jinja | 4 ++-- 13 files changed, 21 insertions(+), 17 deletions(-) diff --git a/.github/workflows/archery.yml b/.github/workflows/archery.yml index d5f419f8a7..dbd24796db 100644 --- a/.github/workflows/archery.yml +++ b/.github/workflows/archery.yml @@ -59,7 +59,7 @@ jobs: - name: Setup Python uses: actions/setup-python@v5 with: - python-version: '3.8' + python-version: '3.12' - name: Install pygit2 binary wheel run: pip install pygit2 --only-binary pygit2 - name: Install Archery, Crossbow- and Test Dependencies diff --git a/.github/workflows/comment_bot.yml b/.github/workflows/comment_bot.yml index dbcbbff549..038a468a81 100644 --- a/.github/workflows/comment_bot.yml +++ b/.github/workflows/comment_bot.yml @@ -43,7 +43,7 @@ jobs: - name: Set up Python uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with: - python-version: 3.8 + python-version: 3.12 - name: Install Archery and Crossbow dependencies run: pip install -e arrow/dev/archery[bot] - name: Handle GitHub comment event diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml index 4892767324..77efda58cb 100644 --- a/.github/workflows/dev.yml +++ b/.github/workflows/dev.yml @@ -43,7 +43,7 @@ jobs: - name: Setup Python uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with: - python-version: 3.8 + python-version: 3.12 - name: Setup Archery run: pip install -e dev/archery[docker] - name: Execute Docker Build @@ -90,7 +90,7 @@ jobs: - name: Install Python uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with: - python-version: '3.8' + python-version: '3.12' - name: Install Ruby uses: ruby/setup-ruby@250fcd6a742febb1123a77a841497ccaa8b9e939 # v1.152.0 with: diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index e394347e95..82b43ee236 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -53,7 +53,7 @@ jobs: - name: Setup Python uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with: - python-version: 3.8 + python-version: 3.12 - name: Setup Archery run: pip install -e dev/archery[docker] - name: Execute Docker Build diff --git a/.github/workflows/docs_light.yml b/.github/workflows/docs_light.yml index 5303531f34..306fc51350 100644 --- a/.github/workflows/docs_light.yml +++ b/.github/workflows/docs_light.yml @@ -59,7 +59,7 @@ jobs: - name: Setup Python uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0 with: - python-version: 3.8 + python-version: 3.12 - name: Setup Archery run: pip install -e dev/archery[docker] - name: Execute Docker Build diff --git a/.github/workflows/java_nightly.yml b/.github/workflows/java_nightly.yml index c19576d2f6..c535dc4a07 100644 --- a/.github/workflows/java_nightly.yml +++ b/.github/workflows
(arrow) branch main updated (c6f20a2348 -> 2fbf22a736)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git from c6f20a2348 GH-40276: [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277) add 2fbf22a736 GH-40248: [R] fallback to the correct libtool when we find a GNU one (#40259) No new revisions were added by this update. Summary of changes: cpp/cmake_modules/BuildUtils.cmake | 22 +- dev/tasks/r/github.macos-linux.local.yml | 12 2 files changed, 33 insertions(+), 1 deletion(-)
(arrow) branch main updated (4ceb661013 -> b684028dfb)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git from 4ceb661013 GH-39880: [Python][CI] Pin moto<5 for dask integration tests (#39881) add b684028dfb GH-39859: [R] Remove macOS from the allow list (#39861) No new revisions were added by this update. Summary of changes: r/tools/nixlibs-allowlist.txt | 1 - r/tools/nixlibs.R | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-)
[arrow] branch main updated: GH-38216: [R] open_dataset(format = "json") not documented (#38258)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new ac581fd2a8 GH-38216: [R] open_dataset(format = "json") not documented (#38258) ac581fd2a8 is described below commit ac581fd2a87b35c872cf334bb147851fe1287714 Author: Divyansh200102 <146909065+divyansh200...@users.noreply.github.com> AuthorDate: Tue Oct 17 23:47:32 2023 +0530 GH-38216: [R] open_dataset(format = "json") not documented (#38258) fixes #38216 * Closes: #38216 Lead-authored-by: Divyansh200102 Co-authored-by: Divyansh200102 <146909065+divyansh200...@users.noreply.github.com> Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/R/dataset.R | 3 ++- r/man/open_dataset.Rd | 7 --- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/r/R/dataset.R b/r/R/dataset.R index 90e6516927..2400d08393 100644 --- a/r/R/dataset.R +++ b/r/R/dataset.R @@ -112,7 +112,8 @@ #' * "csv"/"text", aliases for the same thing (because comma is the default #' delimiter for text files #' * "tsv", equivalent to passing `format = "text", delimiter = "\t"` -#' +#' * "json", for JSON format datasets Note: only newline-delimited JSON (aka ND-JSON) datasets +#' are currently supported #' Default is "parquet", unless a `delimiter` is also specified, in which case #' it is assumed to be "text". #' @param ... additional arguments passed to `dataset_factory()` when `sources` diff --git a/r/man/open_dataset.Rd b/r/man/open_dataset.Rd index 94b537a1d3..7c3d32289f 100644 --- a/r/man/open_dataset.Rd +++ b/r/man/open_dataset.Rd @@ -74,10 +74,11 @@ only version 2 files are supported \item "csv"/"text", aliases for the same thing (because comma is the default delimiter for text files \item "tsv", equivalent to passing \verb{format = "text", delimiter = "\\t"} -} - +\item "json", for JSON format datasets Note: only newline-delimited JSON (aka ND-JSON) datasets +are currently supported Default is "parquet", unless a \code{delimiter} is also specified, in which case -it is assumed to be "text".} +it is assumed to be "text". +}} \item{factory_options}{list of optional FileSystemFactoryOptions: \itemize{
[arrow] branch main updated: MINOR: [R] Avoid stray output from expr when checking for 10.13 (#38303)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 40571db03c MINOR: [R] Avoid stray output from expr when checking for 10.13 (#38303) 40571db03c is described below commit 40571db03cc7f819f33a05dd421ef86816fe0502 Author: Jacob Wujciak-Jens AuthorDate: Tue Oct 17 17:22:18 2023 +0200 MINOR: [R] Avoid stray output from expr when checking for 10.13 (#38303) ### Rationale for this change `expr` was printing the number of matching chars which showed up as noise in the log (which we want to avoid as much as possible to avoid any false positive checks) See https://github.com/apache/arrow/pull/38236#issuecomment-1761679457 for @ jonkeane's investigation. ### What changes are included in this PR? Replace use of expr with test. ### Are these changes tested? Crossbow Lead-authored-by: Jacob Wujciak-Jens Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/configure | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/r/configure b/r/configure index addf7b59c7..c957c9946f 100755 --- a/r/configure +++ b/r/configure @@ -264,7 +264,10 @@ set_pkg_vars () { PKG_CFLAGS="$PKG_CFLAGS $ARROW_R_CXXFLAGS" fi - if [ "$UNAME" = "Darwin" ] && expr $(sw_vers -productVersion) : '10\.13'; then + # We use expr because the product version returns more than just 10.13 and we want to + # match the substring. However, expr always outputs the number of matched characters + # to stdout, to avoid noise in the log we redirect the output to /dev/null + if [ "$UNAME" = "Darwin" ] && expr $(sw_vers -productVersion) : '10\.13' >/dev/null 2>&1; then # avoid C++17 availability warnings on macOS < 11 PKG_CFLAGS="$PKG_CFLAGS -D_LIBCPP_DISABLE_AVAILABILITY" fi
[arrow] branch main updated: GH-33807: [R] Add a message if we detect running under emulation (#37777)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/main by this push: new 64ad8e564e GH-33807: [R] Add a message if we detect running under emulation (#3) 64ad8e564e is described below commit 64ad8e564ea013101b8565ce200e54e5c85bac8d Author: Jonathan Keane AuthorDate: Tue Sep 19 11:15:27 2023 -0500 GH-33807: [R] Add a message if we detect running under emulation (#3) Resolves #33807 and #37034 ### Rationale for this change If someone is running R under emulation, arrow segfaults without error. We can detect this when we load so can also warn people that this is not recommended. Though the version of R being run is not directly an arrow issue, arrow fails very quickly in this configuration. ### What changes are included in this PR? Detect when running under rosetta (on macOS only) and warn when the library is attached ### Are these changes tested? No, given the paucity of ARM-based mac CI, testing this organically would be difficult. But the logic is straightforward. ### Are there any user-facing changes? Yes, a warning when someone loads arrow under emulation. * Closes: #33807 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/R/arrow-package.R | 21 + r/R/install-arrow.R | 4 +--- r/README.md | 2 ++ 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 8f44f8936b..09183250ba 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -183,6 +183,22 @@ configure_tzdb <- function() { # Just to be extra safe, let's wrap this in a try(); # we don't want a failed startup message to prevent the package from loading try({ +# On MacOS only, Check if we are running in under emulation, and warn this will not work +if (on_rosetta()) { + packageStartupMessage( +paste( + "Warning:", + " It appears that you are running R and Arrow in emulation (i.e. you're", + " running an Intel version of R on a non-Intel mac). This configuration is", + " not supported by arrow, you should install a native (arm64) build of R", + " and use arrow with that. See https://cran.r-project.org/bin/macosx/;, + "", + sep = "\n" +) + ) +} + + features <- arrow_info()$capabilities # That has all of the #ifdef features, plus the compression libs and the # string libraries (but not the memory allocators, they're added elsewhere) @@ -225,6 +241,11 @@ on_macos_10_13_or_lower <- function() { package_version(unname(Sys.info()["release"])) < "18.0.0" } +on_rosetta <- function() { + identical(tolower(Sys.info()[["sysname"]]), "darwin") && +identical(system("sysctl -n sysctl.proc_translated", intern = TRUE), "1") +} + option_use_threads <- function() { !is_false(getOption("arrow.use_threads")) } diff --git a/r/R/install-arrow.R b/r/R/install-arrow.R index 8380fa2af9..7017d4f39b 100644 --- a/r/R/install-arrow.R +++ b/r/R/install-arrow.R @@ -61,7 +61,6 @@ install_arrow <- function(nightly = FALSE, verbose = Sys.getenv("ARROW_R_DEV", FALSE), repos = getOption("repos"), ...) { - sysname <- tolower(Sys.info()[["sysname"]]) conda <- isTRUE(grepl("conda", R.Version()$platform)) if (conda) { @@ -80,8 +79,7 @@ install_arrow <- function(nightly = FALSE, # On the M1, we can't use the usual autobrew, which pulls Intel dependencies apple_m1 <- grepl("arm-apple|aarch64.*darwin", R.Version()$platform) # On Rosetta, we have to build without JEMALLOC, so we also can't autobrew -rosetta <- identical(sysname, "darwin") && identical(system("sysctl -n sysctl.proc_translated", intern = TRUE), "1") -if (rosetta) { +if (on_rosetta()) { Sys.setenv(ARROW_JEMALLOC = "OFF") } if (apple_m1 || rosetta) { diff --git a/r/README.md b/r/README.md index d343d6979c..3c1e3570ff 100644 --- a/r/README.md +++ b/r/README.md @@ -73,6 +73,8 @@ additional steps should be required. There are some special cases to note: +- On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes. + - On Linux the installation process can sometimes be more involved because CRAN does not host binaries for Linux. For more information please see the [installation guide](https://arrow.apache.org/docs/r/articles/install.html).
[arrow] branch master updated: GH-15205: [R] Fix a parquet-fixture finding in R tests (#15207)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 6bd847b2ae GH-15205: [R] Fix a parquet-fixture finding in R tests (#15207) 6bd847b2ae is described below commit 6bd847b2aefdb0f10eaf83a3bfe2dc8ee269e8e4 Author: Jonathan Keane AuthorDate: Fri Jan 6 08:25:20 2023 -0600 GH-15205: [R] Fix a parquet-fixture finding in R tests (#15207) A follow on to #15197 where we actually force these tests when the force-tests job is run + make sure that we look at the root of the filesystem for the fixtures * Closes: #15205 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- .github/workflows/r.yml | 2 ++ docker-compose.yml | 1 + r/tests/testthat/test-parquet.R | 23 +-- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index 9173f0e530..e7b1ee06e9 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -69,6 +69,7 @@ jobs: uses: actions/checkout@v3 with: fetch-depth: 0 + submodules: recursive - name: Cache Docker Volumes uses: actions/cache@v3 with: @@ -137,6 +138,7 @@ jobs: uses: actions/checkout@v3 with: fetch-depth: 0 + submodules: recursive - name: Setup Python uses: actions/setup-python@v4 with: diff --git a/docker-compose.yml b/docker-compose.yml index 23583d6b65..df497a2de1 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1242,6 +1242,7 @@ services: LIBARROW_BUILD: 'false' NOT_CRAN: 'true' ARROW_R_DEV: ${ARROW_R_DEV} + ARROW_SOURCE_HOME: '/arrow' volumes: *ubuntu-volumes command: > /bin/bash -c " diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R index be71d813bd..e1e54a5139 100644 --- a/r/tests/testthat/test-parquet.R +++ b/r/tests/testthat/test-parquet.R @@ -458,22 +458,17 @@ test_that("Can read parquet with nested lists and maps", { # * ../cpp/submodules/parquet-testing/data # ARROW_SOURCE_HOME is set in many of our CI setups, so that will find the files # the .. version should catch some (thought not all) ways of running tests locally - parquet_test_data <- file.path( -Sys.getenv("ARROW_SOURCE_HOME", test_path("..")), -"cpp", -"submodules", -"parquet-testing", -"data" - ) - skip_if_not(dir.exists(parquet_test_data), "Parquet test data missing") + base_path <- Sys.getenv("ARROW_SOURCE_HOME", "..") + # make this a full path, at the root of the filesystem if we're using ARROW_SOURCE_HOME + if (base_path != "..") { +base_path <- file.path("", base_path) + } + parquet_test_data <- file.path(base_path, "cpp", "submodules", "parquet-testing", "data") + skip_if_not(dir.exists(parquet_test_data) | force_tests(), "Parquet test data missing") pq <- read_parquet(paste0(parquet_test_data, "/nested_lists.snappy.parquet"), as_data_frame = FALSE) - expect_equal(pq$a$type, list_of(list_of(list_of(utf8(, ignore_attr = TRUE) + expect_type_equal(pq$a, list_of(field("element", list_of(field("element", list_of(field("element", utf8( pq <- read_parquet(paste0(parquet_test_data, "/nested_maps.snappy.parquet"), as_data_frame = FALSE) - expect_equal( -pq$a$type, -map_of(utf8(), map_of(int32(), field("val", boolean(), nullable = FALSE))), -ignore_attr = TRUE - ) + expect_true(pq$a$type == map_of(utf8(), map_of(int32(), field("value", boolean(), nullable = FALSE })
[arrow] branch master updated: GH-15001: [R] Fix Parquet datatype test failure (#15197)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new d4a0c9e8be GH-15001: [R] Fix Parquet datatype test failure (#15197) d4a0c9e8be is described below commit d4a0c9e8be8f2730dd80be9934e27aa6bd4a0850 Author: Will Jones AuthorDate: Thu Jan 5 09:02:19 2023 -0800 GH-15001: [R] Fix Parquet datatype test failure (#15197) * Closes: #15001 Lead-authored-by: Will Jones Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/tests/testthat/test-parquet.R | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R index 591805d4ff..be71d813bd 100644 --- a/r/tests/testthat/test-parquet.R +++ b/r/tests/testthat/test-parquet.R @@ -453,12 +453,27 @@ test_that("deprecated int96 timestamp unit can be specified when reading Parquet }) test_that("Can read parquet with nested lists and maps", { - parquet_test_data <- test_path("../../../cpp/submodules/parquet-testing/data") + # Construct the path to the parquet-testing submodule. This will search: + # * $ARROW_SOURCE_HOME/cpp/submodules/parquet-testing/data + # * ../cpp/submodules/parquet-testing/data + # ARROW_SOURCE_HOME is set in many of our CI setups, so that will find the files + # the .. version should catch some (thought not all) ways of running tests locally + parquet_test_data <- file.path( +Sys.getenv("ARROW_SOURCE_HOME", test_path("..")), +"cpp", +"submodules", +"parquet-testing", +"data" + ) skip_if_not(dir.exists(parquet_test_data), "Parquet test data missing") pq <- read_parquet(paste0(parquet_test_data, "/nested_lists.snappy.parquet"), as_data_frame = FALSE) - expect_equal(pq$a$type, list_of(list_of(list_of(utf8() + expect_equal(pq$a$type, list_of(list_of(list_of(utf8(, ignore_attr = TRUE) pq <- read_parquet(paste0(parquet_test_data, "/nested_maps.snappy.parquet"), as_data_frame = FALSE) - expect_equal(pq$a$type, map_of(utf8(), map_of(int32(), field("val", boolean(), nullable = FALSE + expect_equal( +pq$a$type, +map_of(utf8(), map_of(int32(), field("val", boolean(), nullable = FALSE))), +ignore_attr = TRUE + ) })
[arrow] branch master updated: GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners (#15116)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 4dd5cedb21 GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners (#15116) 4dd5cedb21 is described below commit 4dd5cedb21d7b58d837bdb3c0d35a5cd80fd9f4b Author: Jacob Wujciak-Jens AuthorDate: Tue Jan 3 19:32:58 2023 +0100 GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners (#15116) * Closes: #15114 Authored-by: Jacob Wujciak-Jens Signed-off-by: Jonathan Keane --- dev/tasks/macros.jinja| 5 + dev/tasks/r/github.macos.brew.yml | 7 +-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/dev/tasks/macros.jinja b/dev/tasks/macros.jinja index 72f575a188..9cb0c0f8a8 100644 --- a/dev/tasks/macros.jinja +++ b/dev/tasks/macros.jinja @@ -235,6 +235,10 @@ on: brew unlink python@2 || true brew config brew doctor || true + # The GHA runners install of python > 3.10 is incompatible with brew so we + # have to force overwritting of the symlinks + # see https://github.com/actions/runner-images/issues/6868 + brew install --overwrite python@3.11 python@3.10 ARROW_GLIB_FORMULA=$(echo ${ARROW_FORMULA} | sed -e 's/\.rb/-glib.rb/') echo "ARROW_GLIB_FORMULA=${ARROW_GLIB_FORMULA}" >> ${GITHUB_ENV} @@ -396,3 +400,4 @@ on: {{ key }}: "{{ value }}" {% endfor %} {% endmacro %} + diff --git a/dev/tasks/r/github.macos.brew.yml b/dev/tasks/r/github.macos.brew.yml index 5f426ab42c..7cf86d999d 100644 --- a/dev/tasks/r/github.macos.brew.yml +++ b/dev/tasks/r/github.macos.brew.yml @@ -31,14 +31,17 @@ jobs: env: {{ macros.github_set_sccache_envvars()|indent(8)}} run: | + brew install sccache + # for testing + brew install minio + # TODO: Update the TODO for ARROW-16907 below to refer to main instead of master # after migrating the default branch to main. # TODO(ARROW-16907): apache/arrow@master seems to be installed already # so this does nothing on a branch/PR brew install -v --HEAD apache-arrow - # for testing - brew install minio + - uses: r-lib/actions/setup-r@v2 - name: Install dependencies run: |
[arrow] branch master updated (353ab45cd4 -> 13ede7bb17)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 353ab45cd4 ARROW-17684: [CI][deb] Disable Flight for arm64 (#14300) add 13ede7bb17 ARROW-16605: [CI][R] Fix revdep docker job (#13483) No new revisions were added by this update. Summary of changes: ci/scripts/r_revdepcheck.sh | 58 dev/tasks/r/github.linux.revdepcheck.yml | 57 --- dev/tasks/tasks.yml | 4 --- docker-compose.yml | 11 +++--- r/.Rbuildignore | 1 + 5 files changed, 57 insertions(+), 74 deletions(-) delete mode 100644 dev/tasks/r/github.linux.revdepcheck.yml
[arrow] branch master updated: MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 78d586a458 MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878) 78d586a458 is described below commit 78d586a45852b69c40b88a43d86a1c90efdf1e0d Author: eitsupi <50911393+eits...@users.noreply.github.com> AuthorDate: Tue Aug 23 23:00:56 2022 +0900 MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878) Authored-by: SHIMA Tatsuya Signed-off-by: Jonathan Keane --- r/man/infer_type.Rd | 3 --- 1 file changed, 3 deletions(-) diff --git a/r/man/infer_type.Rd b/r/man/infer_type.Rd index e340afa915..1bba272556 100644 --- a/r/man/infer_type.Rd +++ b/r/man/infer_type.Rd @@ -19,9 +19,6 @@ type(x) An arrow \link[=data-type]{data type} } \description{ -Infer the arrow Array type from an R object. -} -\details{ \code{\link[=type]{type()}} is deprecated in favor of \code{\link[=infer_type]{infer_type()}}. } \examples{
[arrow] branch master updated: ARROW-17084: [R] Install the package before linting (#13620)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 51eb3c8adb ARROW-17084: [R] Install the package before linting (#13620) 51eb3c8adb is described below commit 51eb3c8adb5742f8d0d05c2e371dfbc651499614 Author: Dragoș Moldovan-Grünfeld AuthorDate: Tue Aug 2 13:49:15 2022 +0100 ARROW-17084: [R] Install the package before linting (#13620) The package should be installed before running `lintr::ling_package()` or `lintr::expect_lint_free()` (our case), otherwise we could encounter some false positives. See https://github.com/r-lib/lintr/issues/352#issuecomment-587004345 and https://github.com/r-lib/lintr/issues/406#issuecomment-534601141 Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- .github/workflows/r.yml | 8 1 file changed, 8 insertions(+) diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index 4a9c605e3b..4f706e3e5b 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -327,6 +327,14 @@ jobs: shell: Rscript {0} working-directory: r run: | + Sys.setenv( +RWINLIB_LOCAL = file.path(Sys.getenv("GITHUB_WORKSPACE"), "r", "windows", "libarrow.zip"), +MAKEFLAGS = paste0("-j", parallel::detectCores()), +ARROW_R_DEV = TRUE, +"_R_CHECK_FORCE_SUGGESTS_" = FALSE + ) + # we use pak for package installation since it is faster, safer and more convenient + pak::local_install() pak::pak("lintr") lintr::expect_lint_free() - name: Dump install logs
[arrow] branch master updated (036fdf2d03 -> 778d574b1a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 036fdf2d03 ARROW-17246: [Packaging][deb][RPM] Don't use system jemalloc (#13739) add 778d574b1a ARROW-17166: [R] [CI] force_tests() cannot return TRUE (#13680) No new revisions were added by this update. Summary of changes: r/tests/testthat/helper-skip.R | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow-nanoarrow] branch jonkeane-patch-1 created (now 68c9380)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch jonkeane-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git at 68c9380 Minor typo fix This branch includes the following new commits: new 68c9380 Minor typo fix The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[arrow-nanoarrow] 01/01: Minor typo fix
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch jonkeane-patch-1 in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git commit 68c938035f51b18fa8e3f0ded079bcc8ef975c0a Author: Jonathan Keane AuthorDate: Wed Jul 13 10:44:13 2022 -0500 Minor typo fix --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b1051a0..9750f30 100644 --- a/README.md +++ b/README.md @@ -77,5 +77,5 @@ requiring a library with a similar scope: along which a [mostly header-only C++ library](https://github.com/paleolimbot/geonanoarrowpp/tree/main/src/geoarrow/internal/arrow-hpp) was prototyped. - The [Arrow Database Connectivity](https://github.com/apache/arrow-adbc) C API, for which drivers - in theory can be written in C (which is currently difficult in practice because of there + in theory can be written in C (which is currently difficult in practice because there are few if any tools to help do this properly).
[arrow] branch master updated: ARROW-17059: [C++] Fix expression benchmark (#13584)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new b87e0c1dad ARROW-17059: [C++] Fix expression benchmark (#13584) b87e0c1dad is described below commit b87e0c1dad77c2d95fb979bce831a57d6ae60daa Author: Sasha Krassovsky AuthorDate: Tue Jul 12 12:59:16 2022 -0800 ARROW-17059: [C++] Fix expression benchmark (#13584) Authored-by: Sasha Krassovsky Signed-off-by: Jonathan Keane --- cpp/src/arrow/compute/exec/expression_benchmark.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cpp/src/arrow/compute/exec/expression_benchmark.cc b/cpp/src/arrow/compute/exec/expression_benchmark.cc index 70aa509d2e..debd228498 100644 --- a/cpp/src/arrow/compute/exec/expression_benchmark.cc +++ b/cpp/src/arrow/compute/exec/expression_benchmark.cc @@ -80,8 +80,8 @@ static void ExecuteScalarExpressionOverhead(benchmark::State& state, Expression }); std::vector inputs(num_batches); for (auto& batch : inputs) { -batch = ExecBatch({Datum(ConstantArrayGenerator::Int64(rows_per_batch, 5))}, - /*length=*/1); +batch = ExecBatch({Datum(ConstantArrayGenerator::Int64(rows_per_batch, /*value=*/5))}, + /*length=*/rows_per_batch); } ASSIGN_OR_ABORT(auto bound, expr.Bind(*dataset_schema));
[arrow] branch master updated (5fae150493 -> 3c1caea36a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 5fae150493 ARROW-16726: [Python] Fix Setuptools warnings about installing packages as data (#13309) add 3c1caea36a ARROW-16415: [R] Update `strptime` binding signature with the `tz` argument (#13190) No new revisions were added by this update. Summary of changes: r/R/dplyr-funcs-datetime.R | 50 + r/tests/testthat/test-dplyr-funcs-datetime.R | 67 +++- 2 files changed, 86 insertions(+), 31 deletions(-)
[arrow] branch master updated: ARROW-16626: [C++] Name the C++ streaming execution engine
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new bc4a82fd5b ARROW-16626: [C++] Name the C++ streaming execution engine bc4a82fd5b is described below commit bc4a82fd5b65d90e97b773ca728442f369eb9951 Author: Weston Pace AuthorDate: Wed Jun 1 17:26:14 2022 -0500 ARROW-16626: [C++] Name the C++ streaming execution engine Closes #13207 from westonpace/feature/ARROW-16626--name-query-engine Lead-authored-by: Weston Pace Co-authored-by: Will Jones Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- docs/source/cpp/overview.rst| 3 +++ docs/source/cpp/streaming_execution.rst | 39 + 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/docs/source/cpp/overview.rst b/docs/source/cpp/overview.rst index ccebdba45d..33f075bd18 100644 --- a/docs/source/cpp/overview.rst +++ b/docs/source/cpp/overview.rst @@ -66,6 +66,9 @@ reference. **Kernels** are specialized computation functions running in a loop over a given set of datums representing input and output parameters to the functions. +**Acero** (pronounced [aˈsɜɹo] / ah-SERR-oh) is a streaming execution engine that allows +computation to be expressed as a graph of operators which can transform streams of data. + The IO layer diff --git a/docs/source/cpp/streaming_execution.rst b/docs/source/cpp/streaming_execution.rst index 649968ad43..7ce25f587d 100644 --- a/docs/source/cpp/streaming_execution.rst +++ b/docs/source/cpp/streaming_execution.rst @@ -19,14 +19,13 @@ .. highlight:: cpp .. cpp:namespace:: arrow::compute -== -Streaming execution engine -== +=== +Acero: A C++ streaming execution engine +=== .. warning:: -The streaming execution engine is experimental, and a stable API -is not yet guaranteed. +Acero is experimental and a stable API is not yet guaranteed. Motivation == @@ -35,20 +34,23 @@ For many complex computations, successive direct :ref:`invocation of compute functions ` is not feasible in either memory or computation time. Doing so causes all intermediate data to be fully materialized. To facilitate arbitrarily large inputs -and more efficient resource usage, Arrow also provides a streaming query -engine with which computations can be formulated and executed. +and more efficient resource usage, the Arrow C++ implementation also +provides Acero, a streaming query engine with which computations can +be formulated and executed. .. image:: simple_graph.svg :alt: An example graph of a streaming execution workflow. -:class:`ExecNode` is provided to reify the graph of operations in a query. -Batches of data (:struct:`ExecBatch`) flow along edges of the graph from -node to node. Structuring the API around streams of batches allows the -working set for each node to be tuned for optimal performance independent -of any other nodes in the graph. Each :class:`ExecNode` processes batches -as they are pushed to it along an edge of the graph by upstream nodes -(its inputs), and pushes batches along an edge of the graph to downstream -nodes (its outputs) as they are finalized. +Acero allows computation to be expressed as an "execution plan" +(:class:`ExecPlan`) which is a directed graph of operators. Each operator +(:class:`ExecNode`) provides, transforms, or consumes the data passing +through it. Batches of data (:struct:`ExecBatch`) flow along edges of +the graph from node to node. Structuring the API around streams of batches +allows the working set for each node to be tuned for optimal performance +independent of any other nodes in the graph. Each :class:`ExecNode` +processes batches as they are pushed to it along an edge of the graph by +upstream nodes (its inputs), and pushes batches along an edge of the graph +to downstream nodes (its outputs) as they are finalized. .. seealso:: @@ -366,10 +368,9 @@ This function might be reading a file, iterating through an in memory structure, from a network connection. The arrow library refers to these functions as ``arrow::AsyncGenerator`` and there are a number of utilities for working with these functions. For this example we use a vector of record batches that we've already stored in memory. -In addition, the schema of the data must be known up front. Arrow's streaming execution -engine must know the schema of the data at each stage of the execution graph before any -processing has begun. This means we must supply the schema for a source node separately -from the data itself. +In addition, the schema of the data must be known up front. Acero must know the schema of the data +at each stage of the
[arrow] branch master updated: ARROW-14632: [Python] Make write_dataset arguments keyword-only
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 2ffc10a43b ARROW-14632: [Python] Make write_dataset arguments keyword-only 2ffc10a43b is described below commit 2ffc10a43b2b9a397bfeba993993172082f9722b Author: Austin Dickey AuthorDate: Wed Jun 1 17:22:17 2022 -0500 ARROW-14632: [Python] Make write_dataset arguments keyword-only As a best practice, most of the optional configuration arguments in `write_dataset()` should be keyword-only. This PR enforces that. Closes #13289 from austin3dickey/ARROW-14632 Authored-by: Austin Dickey Signed-off-by: Jonathan Keane --- python/pyarrow/dataset.py| 2 +- python/pyarrow/tests/test_dataset.py | 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/python/pyarrow/dataset.py b/python/pyarrow/dataset.py index 6c1b8db5a6..8ef3e2f7aa 100644 --- a/python/pyarrow/dataset.py +++ b/python/pyarrow/dataset.py @@ -801,7 +801,7 @@ def _ensure_write_partitioning(part, schema, flavor): return part -def write_dataset(data, base_dir, basename_template=None, format=None, +def write_dataset(data, base_dir, *, basename_template=None, format=None, partitioning=None, partitioning_flavor=None, schema=None, filesystem=None, file_options=None, use_threads=True, max_partitions=None, max_open_files=None, diff --git a/python/pyarrow/tests/test_dataset.py b/python/pyarrow/tests/test_dataset.py index 0be01d2336..d2210c4b6c 100644 --- a/python/pyarrow/tests/test_dataset.py +++ b/python/pyarrow/tests/test_dataset.py @@ -1796,6 +1796,12 @@ def test_dictionary_partitioning_outer_nulls_raises(tempdir): ds.write_dataset(table, tempdir, format='ipc', partitioning=part) +def test_positional_keywords_raises(tempdir): +table = pa.table({'a': ['x', 'y', None], 'b': ['x', 'y', 'z']}) +with pytest.raises(TypeError): +ds.write_dataset(table, tempdir, "basename-{i}.arrow") + + @pytest.mark.parquet @pytest.mark.pandas def test_read_partition_keys_only(tempdir):
[arrow] branch master updated: MINOR: [Docs] Update auto_disconnect parameter based on ARROW-14395
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 8295bdc2e8 MINOR: [Docs] Update auto_disconnect parameter based on ARROW-14395 8295bdc2e8 is described below commit 8295bdc2e86e657c59724c3e56da474e5414cb39 Author: Will Jones AuthorDate: Wed Jun 1 17:20:14 2022 -0500 MINOR: [Docs] Update auto_disconnect parameter based on ARROW-14395 #11482 changed the default value of the parameter, but didn't update the docs for it. Closes #13290 from wjones127/minor-duckdb-doc Authored-by: Will Jones Signed-off-by: Jonathan Keane --- r/R/duckdb.R | 6 ++ r/man/to_duckdb.Rd | 6 ++ 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/r/R/duckdb.R b/r/R/duckdb.R index 3951362f8e..b924dafcab 100644 --- a/r/R/duckdb.R +++ b/r/R/duckdb.R @@ -26,9 +26,7 @@ #' If `auto_disconnect = TRUE`, the DuckDB table that is created will be configured #' to be unregistered when the `tbl` object is garbage collected. This is helpful #' if you don't want to have extra table objects in DuckDB after you've finished -#' using them. Currently, this cleanup can, however, sometimes lead to hangs if -#' tables are created and deleted in quick succession, hence the default value -#' of `FALSE` +#' using them. #' #' @param .data the Arrow object (e.g. Dataset, Table) to use for the DuckDB table #' @param con a DuckDB connection to use (default will create one and store it @@ -36,7 +34,7 @@ #' @param table_name a name to use in DuckDB for this object. The default is a #' unique string `"arrow_"` followed by numbers. #' @param auto_disconnect should the table be automatically cleaned up when the -#' resulting object is removed (and garbage collected)? Default: `FALSE` +#' resulting object is removed (and garbage collected)? Default: `TRUE` #' #' @return A `tbl` of the new table in DuckDB #' diff --git a/r/man/to_duckdb.Rd b/r/man/to_duckdb.Rd index 8d6a9e5c62..79c089239b 100644 --- a/r/man/to_duckdb.Rd +++ b/r/man/to_duckdb.Rd @@ -21,7 +21,7 @@ in \code{options("arrow_duck_con")})} unique string \code{"arrow_"} followed by numbers.} \item{auto_disconnect}{should the table be automatically cleaned up when the -resulting object is removed (and garbage collected)? Default: \code{FALSE}} +resulting object is removed (and garbage collected)? Default: \code{TRUE}} } \value{ A \code{tbl} of the new table in DuckDB @@ -37,9 +37,7 @@ The result is a dbplyr-compatible object that can be used in d(b)plyr pipelines. If \code{auto_disconnect = TRUE}, the DuckDB table that is created will be configured to be unregistered when the \code{tbl} object is garbage collected. This is helpful if you don't want to have extra table objects in DuckDB after you've finished -using them. Currently, this cleanup can, however, sometimes lead to hangs if -tables are created and deleted in quick succession, hence the default value -of \code{FALSE} +using them. } \examples{ \dontshow{if (getFromNamespace("run_duckdb_examples", "arrow")()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
[arrow] branch master updated: ARROW-16281: [R] [CI] Bump versions with the release of 4.2
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new ce4dcbdf5f ARROW-16281: [R] [CI] Bump versions with the release of 4.2 ce4dcbdf5f is described below commit ce4dcbdf5f1bcfb3f23b494598b9125c8e7ee52e Author: Dragoș Moldovan-Grünfeld AuthorDate: Wed May 18 13:28:23 2022 -0700 ARROW-16281: [R] [CI] Bump versions with the release of 4.2 Update hard-coded versions on R in our CI after the release of R 4.2. Closes #12980 from dragosmg/r_42_ci_update Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- .env | 2 +- .github/workflows/r.yml| 6 +++--- ci/docker/linux-apt-docs.dockerfile| 2 +- ci/docker/linux-apt-lint.dockerfile| 11 +++ ci/docker/linux-apt-r.dockerfile | 9 ++--- dev/tasks/r/github.linux.arrow.version.back.compat.yml | 1 + dev/tasks/r/github.linux.versions.yml | 3 ++- dev/tasks/tasks.yml| 10 +- 8 files changed, 18 insertions(+), 26 deletions(-) diff --git a/.env b/.env index 5c73161ac1..f56820daad 100644 --- a/.env +++ b/.env @@ -68,7 +68,7 @@ NODE=16 NUMPY=latest PANDAS=latest PYTHON=3.8 -R=4.1 +R=4.2 SPARK=master TURBODBC=latest diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml index 19abac5bb2..8de703b71a 100644 --- a/.github/workflows/r.yml +++ b/.github/workflows/r.yml @@ -57,7 +57,7 @@ jobs: strategy: fail-fast: false matrix: -r: ["4.1"] +r: ["4.2"] ubuntu: [20.04] force-tests: ["true", "false"] env: @@ -244,7 +244,7 @@ jobs: config: - { rtools: 35, rversion: "3.6" } - { rtools: 40, rversion: "4.1" } -# TODO: Once R 4.2 comes out we can switch to devel + 4.2 +- { rtools: 42, rversion: "4.2" } - { rtools: 42, rversion: "devel" } env: ARROW_R_CXXFLAGS: "-Werror" @@ -384,7 +384,7 @@ jobs: timeout = 3600 ) - name: Run lintr -if: ${{ matrix.config.rversion == '4.1' }} +if: ${{ matrix.config.rversion == '4.2' }} env: NOT_CRAN: "true" GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/ci/docker/linux-apt-docs.dockerfile b/ci/docker/linux-apt-docs.dockerfile index 0ef1231321..3a8a9cf8e2 100644 --- a/ci/docker/linux-apt-docs.dockerfile +++ b/ci/docker/linux-apt-docs.dockerfile @@ -18,7 +18,7 @@ ARG base FROM ${base} -ARG r=4.1 +ARG r=4.2 ARG jdk=8 # See R install instructions at https://cloud.r-project.org/bin/linux/ubuntu/ diff --git a/ci/docker/linux-apt-lint.dockerfile b/ci/docker/linux-apt-lint.dockerfile index 036be1ac13..249072ae32 100644 --- a/ci/docker/linux-apt-lint.dockerfile +++ b/ci/docker/linux-apt-lint.dockerfile @@ -40,16 +40,11 @@ RUN apt-get update && \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -ARG r=4.1 +ARG r=4.2 RUN wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \ tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \ -# NOTE: R 3.5 and 3.6 are available in the repos with -cran35 suffix -# for trusty, xenial, bionic, and eoan (as of May 2020) -# -cran40 has 4.0 versions for bionic and focal -# R 3.4 is available without the suffix but only for trusty and xenial -# TODO: make sure OS version and R version are valid together and conditionally set repo suffix -# This is a hack to turn 3.6 into 35, and 4.0/4.1 into 40: -add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu '$(lsb_release -cs)'-cran'$(echo "${r}" | tr -d . | tr 6 5 | tr 1 0)'/' && \ +# NOTE: Only R >= 4.0 is available in this repo +add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu '$(lsb_release -cs)'-cran40/' && \ apt-get install -y \ r-base=${r}* \ r-recommended=${r}* \ diff --git a/ci/docker/linux-apt-r.dockerfile b/ci/docker/linux-apt-r.dockerfile index 7526f78452..7083bfa3d9 100644 --- a/ci/docker/linux-apt-r.dockerfile +++ b/ci/docker/linux-apt-r.dockerfile @@ -38,13 +38,8 @@ RUN apt-get update -y && \ software-properties-common && \ wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \ tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \ -# NOTE: R 3.5 and 3.6 are available in the repos with -cran35 suffix -# for trusty, xenial, bionic, and eoan (as of May 2020) -# -cran40 has 4.0 versions for bionic an
[arrow] branch master updated (b264dca5a0 -> 214135d8ce)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from b264dca5a0 MINOR: [R] Move tzdb loading out of .onLoad() to avoid a check NOTE add 214135d8ce ARROW-14848: [R] Implement bindings for lubridate's parse_date_time No new revisions were added by this update. Summary of changes: r/NEWS.md| 6 ++ r/R/dplyr-datetime-helpers.R | 45 + r/R/dplyr-funcs-datetime.R | 36 +++ r/tests/testthat/test-dplyr-funcs-datetime.R | 96 +++- 4 files changed, 182 insertions(+), 1 deletion(-)
[arrow] branch master updated: ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 3c03d49364 ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows 3c03d49364 is described below commit 3c03d4936445781e29e41392d9a0bc3db62b39f2 Author: Dragoș Moldovan-Grünfeld AuthorDate: Fri Apr 29 17:34:09 2022 -0500 ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows Closes #12883 from dragosmg/datetime_unit_testing_cleanup Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- r/tests/testthat/test-dplyr-funcs-datetime.R | 5 - r/tests/testthat/test-dplyr-funcs-type.R | 7 +-- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R b/r/tests/testthat/test-dplyr-funcs-datetime.R index a4c5ee3c22..47626a6cb1 100644 --- a/r/tests/testthat/test-dplyr-funcs-datetime.R +++ b/r/tests/testthat/test-dplyr-funcs-datetime.R @@ -841,8 +841,6 @@ test_that("month() supports integer input", { test_df_month ) -skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168 - compare_dplyr_binding( .input %>% # R returns ordered factor whereas Arrow returns character @@ -904,8 +902,6 @@ test_that("month() errors with double input and returns NA with int outside 1:12 }) test_that("date works in arrow", { - # https://issues.apache.org/jira/browse/ARROW-13168 - skip_on_os("windows") # this date is specific since lubridate::date() is different from base::as.Date() # since as.Date returns the UTC date and date() doesn't test_df <- tibble( @@ -1123,7 +1119,6 @@ test_that("difftime works correctly", { ignore_attr = TRUE ) - skip_on_os("windows") test_df_with_tz <- tibble( time1 = as.POSIXct( c("2021-02-20", "2021-07-31", "2021-10-30", "2021-01-31"), diff --git a/r/tests/testthat/test-dplyr-funcs-type.R b/r/tests/testthat/test-dplyr-funcs-type.R index 6a07d36e81..e4283e39b5 100644 --- a/r/tests/testthat/test-dplyr-funcs-type.R +++ b/r/tests/testthat/test-dplyr-funcs-type.R @@ -873,7 +873,6 @@ test_that("`as.Date()` and `as_date()`", { fixed = TRUE ) - # we do not support as.Date() with double/ float (error surfaced from C++) # TODO revisit after https://issues.apache.org/jira/browse/ARROW-15798 expect_error( @@ -958,7 +957,11 @@ test_that("`as_datetime()`", { }) test_that("format date/time", { - skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168 + # locale issues + # TODO revisit after https://issues.apache.org/jira/browse/ARROW-16399 is done + if (tolower(Sys.info()[["sysname"]]) == "windows") { +withr::local_locale(LC_TIME = "C") + } # In 3.4 the lack of tzone attribute causes spurious failures skip_if_r_version("3.4.4")
[arrow] branch master updated: ARROW-16373: [Docs][CI] Small improvements to CI documentation
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new e0061bbb8c ARROW-16373: [Docs][CI] Small improvements to CI documentation e0061bbb8c is described below commit e0061bbb8cd269f4e8b880d1a4ba181312cbc07f Author: Dragoș Moldovan-Grünfeld AuthorDate: Wed Apr 27 17:24:47 2022 -0500 ARROW-16373: [Docs][CI] Small improvements to CI documentation Closes #12989 from dragosmg/patch-1 Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- .../developers/continuous_integration/overview.rst | 14 +++--- .../developers/guide/step_by_step/arrow_codebase.rst | 12 ++-- .../developers/guide/step_by_step/pr_lifecycle.rst | 18 +- r/vignettes/developers/workflow.Rmd| 4 ++-- 4 files changed, 24 insertions(+), 24 deletions(-) diff --git a/docs/source/developers/continuous_integration/overview.rst b/docs/source/developers/continuous_integration/overview.rst index 73aef370cd..3c21c17063 100644 --- a/docs/source/developers/continuous_integration/overview.rst +++ b/docs/source/developers/continuous_integration/overview.rst @@ -31,18 +31,18 @@ Some files central to Arrow CI are: We use :ref:`Docker` in order to have portable and reproducible Linux builds, as well as running Windows builds in Windows containers. We use :ref:`Archery` and :ref:`Crossbow` to help co-ordinate the various CI tasks. -One thing to note is the some of the services defined in ``docker-compose.yml`` are interdependent. When running services locally, you must either manually build its dependencies first, or build it via the use of ``archery run ...`` which automatically finds and builds dependencies. +One thing to note is the some of the services defined in ``docker-compose.yml`` are interdependent. When running services locally, you must either manually build its dependencies first, or build it via the use of ``archery run ...`` which automatically finds and builds dependencies. There are numerous important directories in the Arrow project which relate to CI: - ``.github/worflows`` - workflows that are run via GitHub actions and are triggered by things like pull requests being submitted or merged -- ``dev/tasks`` - containing on-demand jobs triggered/submitted via ``archery crossbow submit ...``, typically nightly builds or relating to the release process +- ``dev/tasks`` - containing extended jobs triggered/submitted via ``archery crossbow submit ...``, typically nightly builds or relating to the release process - ``ci/`` - containing scripts, dockerfiles, and any supplemental files, e.g. patch files, conda environment files, vcpkg triplet files. Instead of thinking about Arrow CI in terms of files and folders, it may be conceptually simpler to instead divide it into 2 main categories: -- CI jobs which are triggered based on specific actions on GitHub (pull requests opened, pull requests merged, etc) -- On-demand builds which are manually triggered on a nightly basis or via Archery +- **action-triggered builds**: CI jobs which are triggered based on specific actions on GitHub (pull requests opened, pull requests merged, etc) +- **extended builds**: manually triggered with many being run on a nightly basis Action-triggered builds --- @@ -61,9 +61,9 @@ The ``.yml`` files in ``.github/worflows`` are workflows which are run on GitHub There are two other files which define action-triggered builds: - ``.travis.yml`` - runs on all commits and is used to test on architectures such as ARM and S390x -- ``appveyor.yml`` - runs on commits related to Python or C++ +- ``appveyor.yml`` - runs on commits related to Python or C++ -On-demand builds +Extended builds --- Crossbow is a subcomponent of Archery and can be used to manually trigger builds. The tasks which can be run on Crossbow can be found in the ``dev/tasks`` directory. This directory contains: @@ -73,4 +73,4 @@ Crossbow is a subcomponent of Archery and can be used to manually trigger builds Most of these tasks are run as part of the nightly builds, though also can be triggered manually by add a comment to a PR which begins with ``@github-actions crossbow submit`` followed by the name of the task to be run. -For convenience purpose, the tasks in ``dev/tasks/tasks.yml`` are defined in groups, which makes it simpler for multiple tasks to be submitted to Crossbow at once. The task definitions here contain information about which service defined in ``docker-compose.yml`` to run, the CI service to run the task on, and which template file to use as the basis for that task. \ No newline at end of file +For convenience purpose, the tasks in ``dev/tasks/tasks.yml`` are defined in groups, which
[arrow] branch master updated (24f372297c -> d92777270b)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 24f372297c ARROW-16294: [C++] Improve performance of parquet readahead add d92777270b ARROW-16325: [R] Add task for R package with gcc12 No new revisions were added by this update. Summary of changes: dev/tasks/tasks.yml | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-)
[arrow] branch master updated: ARROW-16374: [R] [C++] skip another snappy test during sanitizer runs
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new f03f090d0c ARROW-16374: [R] [C++] skip another snappy test during sanitizer runs f03f090d0c is described below commit f03f090d0c9fd6c85e046e2790c5d443729f6b30 Author: Jonathan Keane AuthorDate: Wed Apr 27 15:14:58 2022 -0500 ARROW-16374: [R] [C++] skip another snappy test during sanitizer runs Another example of https://github.com/google/snappy/pull/148 Closes #13014 from jonkeane/ARROW-16374 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/tests/testthat/test-parquet.R | 2 ++ 1 file changed, 2 insertions(+) diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R index dbafd5d62c..1737b7100c 100644 --- a/r/tests/testthat/test-parquet.R +++ b/r/tests/testthat/test-parquet.R @@ -197,6 +197,8 @@ test_that("Maps are preserved when writing/reading from Parquet", { }) test_that("read_parquet() and write_parquet() accept connection objects", { + skip_if_not_available("snappy") + tf <- tempfile() on.exit(unlink(tf))
[arrow-site] branch master updated: ARROW-16244: [Website] Arrow for R cheatsheet blog post (#204)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow-site.git The following commit(s) were added to refs/heads/master by this push: new e7fe76ae3f ARROW-16244: [Website] Arrow for R cheatsheet blog post (#204) e7fe76ae3f is described below commit e7fe76ae3f92be639fe0ccf7d2a6d0fe47cf775e Author: Stephanie Hazlitt AuthorDate: Wed Apr 27 07:39:25 2022 -0700 ARROW-16244: [Website] Arrow for R cheatsheet blog post (#204) * add cheatsheet thumbnail png * add cheatsheet post md file * use local contributors for author in yaml * tweak image+header sizes * centre png * edit and use r cookbook url Co-authored-by: Nicola Crane * minor edit Co-authored-by: Nic Crane * add Apache to title Co-authored-by: Neal Richardson * add Apache in first ref Co-authored-by: Neal Richardson * bump publish date * update img date * update filename date * update thumbnail url Co-authored-by: Nicola Crane Co-authored-by: Neal Richardson --- _data/contributors.yml| 3 ++ _posts/2022-04-27-arrow-r-cheatsheet.md | 49 ++ img/20220427-arrow-r-cheatsheet-thumbnail.png | Bin 0 -> 814228 bytes 3 files changed, 52 insertions(+) diff --git a/_data/contributors.yml b/_data/contributors.yml index 3471b50c67..0a01adb255 100644 --- a/_data/contributors.yml +++ b/_data/contributors.yml @@ -55,4 +55,7 @@ - name: Ruan Pearce-Authers apacheId: ruanpa # Not a real apacheId githubId: returnString +- name: Stephanie Hazlitt + apacheId: stephhazlitt + githubId: stephhazlitt # End contributors.yml diff --git a/_posts/2022-04-27-arrow-r-cheatsheet.md b/_posts/2022-04-27-arrow-r-cheatsheet.md new file mode 100644 index 00..3e7651097d --- /dev/null +++ b/_posts/2022-04-27-arrow-r-cheatsheet.md @@ -0,0 +1,49 @@ +--- +layout: post +title: Apache Arrow for R Cheatsheet +date: "2022-04-27 00:00:00" +author: stephhazlitt +categories: [application] +--- + + +We are excited to introduce the new [Apache Arrow for R Cheatsheet](https://github.com/apache/arrow/blob/master/r/cheatsheet/arrow-cheatsheet.pdf). + + +https://github.com/apache/arrow/blob/master/r/cheatsheet/arrow-cheatsheet.pdf;> + + + + +## Helping (Not Cheating) + +While [cheatsheets](https://en.wikipedia.org/wiki/Cheat_sheet) may have started as a set of notes used without an instructor’s knowledgeso, ummm, cheatingusing the Arrow for R cheatsheet is definitely not cheating! Today, cheatsheets are a common tool to provide users an introduction to software’s functionality and a quick reference guide to help users get started. + +The Arrow for R cheatsheet is intended to be an easy-to-scan introduction to the Arrow R package and Arrow data structures, with getting started sections on some of the package’s main functionality. The cheatsheet includes introductory snippets on using Arrow to read and work with larger-than-memory multi-file data sets, sending and receiving data with Flight, reading data from cloud storage without downloading the data first, and more. The Arrow for R cheatsheet also directs users to th [...] + +## Cheatsheet Maintenance + +See something that needs updating? Or want to suggest a change? Like software itself, a package cheatsheet needs maintenance to keep pace with new features or user-facing changes. Contributions can be made by downloading and making changes to the [`arrow-cheatsheet.pptx` file](https://github.com/apache/arrow/tree/master/r/cheatsheet) (in Microsoft PowerPoint or Google Slides), and offering the revised `.pptx` and rendered PDF back to the project following the _new_ [New Contributors Guid [...] + +## By the Community For the Community + +The Arrow for R cheatsheet was initiated by Mauricio (Pachá) Vargas Sepúlveda ([ARROW-13616](https://issues.apache.org/jira/browse/ARROW-13616)) and was co-developed and reviewed by many Apache Arrow community members. The cheatsheet was created by the community for the community, and anyone in the Arrow community is welcome and encouraged to help with maintenance and offer improvements. Thank you for your support! \ No newline at end of file diff --git a/img/20220427-arrow-r-cheatsheet-thumbnail.png b/img/20220427-arrow-r-cheatsheet-thumbnail.png new file mode 100644 index 00..ecd3b0d763 Binary files /dev/null and b/img/20220427-arrow-r-cheatsheet-thumbnail.png differ
[arrow] branch master updated (a16be6b7b6 -> e1e782a454)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from a16be6b7b6 ARROW-16121: [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset add e1e782a454 ARROW-15015: [R] Test / CI flag for ensuring all tests are run? No new revisions were added by this update. Summary of changes: .github/workflows/r.yml| 5 +++-- ci/scripts/r_test.sh | 6 ++ r/tests/testthat/helper-skip.R | 34 ++ 3 files changed, 43 insertions(+), 2 deletions(-)
[arrow] branch master updated: ARROW-15800 [R] Implement bindings for `lubridate::as_date()` and `lubridate::as_datetime()`
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 16638a4452 ARROW-15800 [R] Implement bindings for `lubridate::as_date()` and `lubridate::as_datetime()` 16638a4452 is described below commit 16638a445201e7bf61358c96a6e70ab81df8a001 Author: Dragoș Moldovan-Grünfeld AuthorDate: Fri Apr 22 11:08:59 2022 -0500 ARROW-15800 [R] Implement bindings for `lubridate::as_date()` and `lubridate::as_datetime()` Closes #12738 from dragosmg/as_date_as_datetime_take2 Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 61 +- r/R/dplyr-funcs-type.R | 83 +++ r/man/arrow-package.Rd | 4 +- r/tests/testthat/test-dplyr-funcs-type.R | 132 +-- 5 files changed, 222 insertions(+), 59 deletions(-) diff --git a/r/NEWS.md b/r/NEWS.md index eb5cd9a155..71a4d0be73 100644 --- a/r/NEWS.md +++ b/r/NEWS.md @@ -27,6 +27,7 @@ * Added `make_difftime()` (duration constructor) * Added duration helper functions: `dyears()`, `dmonths()`, `dweeks()`, `ddays()`, `dhours()`, `dminutes()`, `dseconds()`, `dmilliseconds()`, `dmicroseconds()`, `dnanoseconds()`. * date-time functionality: + * Added `as_date()` and `as_datetime()` * Added `difftime` and `as.difftime()` * Added `as.Date()` to convert to date * `median()` and `quantile()` will warn once about approximate calculations regardless of interactivity. diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R index a674a6402b..a6bc79ec7c 100644 --- a/r/R/dplyr-funcs-datetime.R +++ b/r/R/dplyr-funcs-datetime.R @@ -263,11 +263,11 @@ register_bindings_duration <- function() { # cast to timestamp if time1 and time2 are not dates or timestamp expressions # (the subtraction of which would output a `duration`) if (!call_binding("is.instant", time1)) { - time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp(timezone = "UTC"))) + time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp())) } if (!call_binding("is.instant", time2)) { - time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC"))) + time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp())) } # if time1 or time2 are timestamps they cannot be expressed in "s" /seconds @@ -476,3 +476,60 @@ duration_from_chunks <- function(chunks) { } duration } + +binding_as_date <- function(x, +format = NULL, +tryFormats = "%Y-%m-%d", +origin = "1970-01-01") { + + if (is.null(format) && length(tryFormats) > 1) { +abort("`as.Date()` with multiple `tryFormats` is not supported in Arrow") + } + + if (call_binding("is.Date", x)) { +return(x) + +# cast from character + } else if (call_binding("is.character", x)) { +x <- binding_as_date_character(x, format, tryFormats) + +# cast from numeric + } else if (call_binding("is.numeric", x)) { +x <- binding_as_date_numeric(x, origin) + } + + build_expr("cast", x, options = cast_options(to_type = date32())) +} + +binding_as_date_character <- function(x, + format = NULL, + tryFormats = "%Y-%m-%d") { + format <- format %||% tryFormats[[1]] + # unit = 0L is the identifier for seconds in valid_time32_units + build_expr("strptime", x, options = list(format = format, unit = 0L)) +} + +binding_as_date_numeric <- function(x, origin = "1970-01-01") { + + # Arrow does not support direct casting from double to date32(), but for + # integer-like values we can go via int32() + # https://issues.apache.org/jira/browse/ARROW-15798 + # TODO revisit if arrow decides to support double -> date casting + if (!call_binding("is.integer", x)) { +x <- build_expr("cast", x, options = cast_options(to_type = int32())) + } + + if (origin != "1970-01-01") { +delta_in_sec <- call_binding("difftime", origin, "1970-01-01") +# TODO: revisit once either of these issues is addressed: +# https://issues.apache.org/jira/browse/ARROW-16253 (helper function for +# casting from double to duration) or +# https://issues.apache.org/jira/browse/ARROW-15862 (casting from int32 +# -> duration or doub
[arrow] branch master updated (0ce8ce8b19 -> c4b646e715)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 0ce8ce8b19 ARROW-11415: [R] map_batches wouldn't accept a dataset as an argument add c4b646e715 ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 63 +- r/tests/testthat/test-dplyr-funcs-datetime.R | 80 +++- 3 files changed, 118 insertions(+), 26 deletions(-)
[arrow] branch master updated: ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch Linux
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new f7bccc51cc ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch Linux f7bccc51cc is described below commit f7bccc51cc8ab384134ee50a8dd0af03d937e8cd Author: Jonathan Keane AuthorDate: Thu Apr 21 16:50:14 2022 -0500 ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch Linux Closes #11666 from jonkeane/ARROW-14638-ccache Lead-authored-by: Jonathan Keane Co-authored-by: Neal Richardson Signed-off-by: Jonathan Keane --- .env | 6 -- ci/docker/linux-r.dockerfile | 3 +++ ci/scripts/r_docker_configure.sh | 24 dev/tasks/r/azure.linux.yml | 1 + dev/tasks/tasks.yml | 9 + docker-compose.yml | 1 + r/tools/nixlibs.R| 10 -- 7 files changed, 50 insertions(+), 4 deletions(-) diff --git a/.env b/.env index a972654497..629dd04980 100644 --- a/.env +++ b/.env @@ -73,11 +73,13 @@ SPARK=master TURBODBC=latest # These correspond to images on Docker Hub that contain R, e.g. rhub/ubuntu-gcc-release:latest -ARROW_R_DEV=TRUE R_IMAGE=ubuntu-gcc-release R_ORG=rhub -R_PRUNE_DEPS=FALSE R_TAG=latest + +# Env vars for R builds +ARROW_R_DEV=TRUE +R_PRUNE_DEPS=FALSE TZ=UTC # -1 does not attempt to install a devtoolset version, any positive integer will install devtoolset-n diff --git a/ci/docker/linux-r.dockerfile b/ci/docker/linux-r.dockerfile index 1cbde3207e..804fb09f09 100644 --- a/ci/docker/linux-r.dockerfile +++ b/ci/docker/linux-r.dockerfile @@ -33,6 +33,9 @@ ENV DEVTOOLSET_VERSION=${devtoolset_version} ARG r_prune_deps=FALSE ENV R_PRUNE_DEPS=${r_prune_deps} +ARG r_custom_ccache=false +ENV R_CUSTOM_CCACHE=${r_custom_ccache} + ARG tz="UTC" ENV TZ=${tz} diff --git a/ci/scripts/r_docker_configure.sh b/ci/scripts/r_docker_configure.sh index 518df1040d..9f93ba2b61 100755 --- a/ci/scripts/r_docker_configure.sh +++ b/ci/scripts/r_docker_configure.sh @@ -42,6 +42,30 @@ else apt-get update fi +# Enable ccache if requested based on http://dirk.eddelbuettel.com/blog/2017/11/27/ +: ${R_CUSTOM_CCACHE:=FALSE} +R_CUSTOM_CCACHE=`echo $R_CUSTOM_CCACHE | tr '[:upper:]' '[:lower:]'` +if [ ${R_CUSTOM_CCACHE} = "true" ]; then + # install ccache + $PACKAGE_MANAGER install -y epel-release || true + $PACKAGE_MANAGER install -y ccache + + mkdir -p ~/.R + echo "VER= +CCACHE=ccache +CC=\$(CCACHE) gcc\$(VER) +CXX=\$(CCACHE) g++\$(VER) +CXX11=\$(CCACHE) g++\$(VER)" >> ~/.R/Makevars + + mkdir -p ~/.ccache/ + echo "max_size = 5.0G +# important for R CMD INSTALL *.tar.gz as tarballs are expanded freshly -> fresh ctime +sloppiness = include_file_ctime +# also important as the (temp.) directory name will differ +hash_dir = false" >> ~/.ccache/ccache.conf +fi + + # Special hacking to try to reproduce quirks on fedora-clang-devel on CRAN # which uses a bespoke clang compiled to use libc++ # https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang diff --git a/dev/tasks/r/azure.linux.yml b/dev/tasks/r/azure.linux.yml index 50b27aa7be..fd48141961 100644 --- a/dev/tasks/r/azure.linux.yml +++ b/dev/tasks/r/azure.linux.yml @@ -43,6 +43,7 @@ jobs: export R_IMAGE={{ r_image }} export R_TAG={{ r_tag }} export DEVTOOLSET_VERSION={{ devtoolset_version|default("-1") }} + export R_CUSTOM_CCACHE={{ r_custom_ccache|default("false") }} docker-compose pull --ignore-pull-failures r docker-compose build r displayName: Docker build diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml index b45dec61ff..a6a41ca274 100644 --- a/dev/tasks/tasks.yml +++ b/dev/tasks/tasks.yml @@ -1279,6 +1279,15 @@ tasks: template: r/github.linux.offline.build.yml + test-r-rhub-debian-gcc-release-custom-ccache: +ci: azure +template: r/azure.linux.yml +params: + r_org: rhub + r_image: debian-gcc-release + r_tag: latest + r_custom_ccache: true + {% for r_org, r_image, r_tag in [("rhub", "ubuntu-gcc-release", "latest"), ("rocker", "r-base", "latest"), ("rstudio", "r-base", "4.1-focal"), diff --git a/docker-compose.yml b/docker-compose.yml index f3c67fc4af..cff1a1665c 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1209,6 +1209,7 @@ services: devtoolset_version: ${DEVTOOLSET_VERSION} tz: ${TZ} r_prune_deps: ${R_PRUNE_DEPS} +r_custom_ccache: ${R_CUSTOM_CCACHE} shm_size: *shm-size environment: LIBARROW_DOWNL
[arrow] branch master updated: ARROW-12659: [C++] Support is_valid as a guarantee
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 0e03af446c ARROW-12659: [C++] Support is_valid as a guarantee 0e03af446c is described below commit 0e03af446c328d0ef963510c3292cb14e092b917 Author: David Li AuthorDate: Thu Apr 21 13:55:15 2022 -0500 ARROW-12659: [C++] Support is_valid as a guarantee This rebases #10253 and fixes it up to also address ARROW-15312, including a regression test. This refactors how inequalities, is_valid, and is_null are treated in expression simplification, and updates the guarantees that the Parquet/Datasets emits for row groups to properly reflect nullability. Closes #12891 from lidavidm/arrow-12659 Lead-authored-by: David Li Co-authored-by: Benjamin Kietzman Co-authored-by: Antoine Pitrou Signed-off-by: Jonathan Keane --- cpp/src/arrow/compute/exec/expression.cc | 405 +++-- cpp/src/arrow/compute/exec/expression.h| 7 +- cpp/src/arrow/compute/exec/expression_test.cc | 137 ++- cpp/src/arrow/compute/kernels/scalar_validity.cc | 72 +++- .../arrow/compute/kernels/scalar_validity_test.cc | 19 + cpp/src/arrow/dataset/file_csv_test.cc | 1 + cpp/src/arrow/dataset/file_ipc_test.cc | 1 + cpp/src/arrow/dataset/file_orc_test.cc | 1 + cpp/src/arrow/dataset/file_parquet.cc | 26 +- cpp/src/arrow/dataset/file_parquet_test.cc | 21 +- cpp/src/arrow/dataset/test_util.h | 26 +- cpp/src/arrow/type.h | 2 +- cpp/src/arrow/util/stl_util_test.cc| 7 + cpp/src/arrow/util/vector.h| 4 +- docs/source/cpp/compute.rst| 9 +- docs/source/python/api/compute.rst | 1 + 16 files changed, 570 insertions(+), 169 deletions(-) diff --git a/cpp/src/arrow/compute/exec/expression.cc b/cpp/src/arrow/compute/exec/expression.cc index 1ef5c6e7b9..8f7a9a1c8c 100644 --- a/cpp/src/arrow/compute/exec/expression.cc +++ b/cpp/src/arrow/compute/exec/expression.cc @@ -34,6 +34,7 @@ #include "arrow/util/optional.h" #include "arrow/util/string.h" #include "arrow/util/value_parsing.h" +#include "arrow/util/vector.h" namespace arrow { @@ -110,7 +111,7 @@ namespace { std::string PrintDatum(const Datum& datum) { if (datum.is_scalar()) { -if (!datum.scalar()->is_valid) return "null"; +if (!datum.scalar()->is_valid) return "null[" + datum.type()->ToString() + "]"; switch (datum.type()->id()) { case Type::STRING: @@ -129,6 +130,8 @@ std::string PrintDatum(const Datum& datum) { } return datum.scalar()->ToString(); + } else if (datum.is_array()) { +return "Array[" + datum.type()->ToString() + "]"; } return datum.ToString(); } @@ -305,19 +308,49 @@ bool Expression::IsNullLiteral() const { return false; } -bool Expression::IsSatisfiable() const { - if (type() && type()->id() == Type::NA) { -return false; +namespace { +util::optional GetNullHandling( +const Expression::Call& call) { + DCHECK_NE(call.function, nullptr); + if (call.function->kind() == compute::Function::SCALAR) { +return static_cast(call.kernel)->null_handling; } + return util::nullopt; +} +} // namespace + +bool Expression::IsSatisfiable() const { + if (!type()) return true; + if (type()->id() != Type::BOOL) return true; if (auto lit = literal()) { if (lit->null_count() == lit->length()) { return false; } -if (lit->is_scalar() && lit->type()->id() == Type::BOOL) { +if (lit->is_scalar()) { return lit->scalar_as().value; } + +return true; + } + + if (field_ref()) return true; + + auto call = CallNotNull(*this); + + // invert(true_unless_null(x)) is always false or null by definition + // true_unless_null arises in simplification of inequalities below + if (call->function_name == "invert") { +if (auto nested_call = call->arguments[0].call()) { + if (nested_call->function_name == "true_unless_null") return false; +} + } + + if (call->function_name == "and_kleene" || call->function_name == "and") { +for (const Expression& arg : call->arguments) { + if (!arg.IsSatisfiable()) return false; +} } return true; @@ -370,9 +403,11 @@ Result BindNonRecursive(Expression::Call call, bool insert_implicit_ compute::KernelContext kernel_context(exec_context); if (call.kernel->init) { +const FunctionOptions* options = +call.options ? call.opti
[arrow] branch master updated (20bc63a820 -> c73870acdc)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 20bc63a820 ARROW-16242: [Go] xerrors.Errorf and xerrors.Is are deprecated, fix linting add c73870acdc ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems No new revisions were added by this update. Summary of changes: cpp/thirdparty/download_dependencies.sh| 8 --- r/.gitignore | 1 + r/R/install-arrow.R| 25 -- .../tools/download_dependencies_R.sh | 19 4 files changed, 35 insertions(+), 18 deletions(-) copy cpp/thirdparty/download_dependencies.sh => r/tools/download_dependencies_R.sh (74%)
[arrow] branch master updated (1f43abc933 -> 7ae86de86b)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git from 1f43abc933 ARROW-16074: [Docs] Document joins add 7ae86de86b ARROW-14944 [R] Implement `lubridate::make_difftime()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 66 + r/R/dplyr-funcs.R| 1 + r/tests/testthat/test-dplyr-funcs-datetime.R | 89 4 files changed, 157 insertions(+)
[arrow] branch master updated: ARROW-15517: [R] Use WriteNode in write_dataset()
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 4b3f4677b9 ARROW-15517: [R] Use WriteNode in write_dataset() 4b3f4677b9 is described below commit 4b3f4677b995cb7263e4a4e65daf00189f638617 Author: Neal Richardson AuthorDate: Tue Apr 19 16:40:57 2022 -0500 ARROW-15517: [R] Use WriteNode in write_dataset() This should allow streaming writes in more cases, e.g. with a join. Closes #12316 from nealrichardson/write-node Authored-by: Neal Richardson Signed-off-by: Jonathan Keane --- r/R/arrowExports.R| 8 ++-- r/R/dataset-format.R | 4 +- r/R/dataset-write.R | 87 +++ r/R/dplyr.R | 11 - r/R/metadata.R| 22 - r/R/parquet.R | 38 --- r/R/query-engine.R| 29 +--- r/src/arrowExports.cpp| 62 + r/src/compute-exec.cpp| 49 ++-- r/src/dataset.cpp | 24 -- r/tests/testthat/test-dataset-write.R | 70 r/tests/testthat/test-metadata.R | 36 --- 12 files changed, 291 insertions(+), 149 deletions(-) diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R index 7bf77f1e66..6b969336c9 100644 --- a/r/R/arrowExports.R +++ b/r/R/arrowExports.R @@ -420,6 +420,10 @@ ExecNode_Scan <- function(plan, dataset, filter, materialized_field_names) { .Call(`_arrow_ExecNode_Scan`, plan, dataset, filter, materialized_field_names) } +ExecPlan_Write <- function(plan, final_node, metadata, file_write_options, filesystem, base_dir, partitioning, basename_template, existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group) { + invisible(.Call(`_arrow_ExecPlan_Write`, plan, final_node, metadata, file_write_options, filesystem, base_dir, partitioning, basename_template, existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group)) +} + ExecNode_Filter <- function(input, filter) { .Call(`_arrow_ExecNode_Filter`, input, filter) } @@ -748,10 +752,6 @@ dataset___Scanner__schema <- function(sc) { .Call(`_arrow_dataset___Scanner__schema`, sc) } -dataset___Dataset__Write <- function(file_write_options, filesystem, base_dir, partitioning, basename_template, scanner, existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group) { - invisible(.Call(`_arrow_dataset___Dataset__Write`, file_write_options, filesystem, base_dir, partitioning, basename_template, scanner, existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group)) -} - dataset___Scanner__TakeRows <- function(scanner, indices) { .Call(`_arrow_dataset___Scanner__TakeRows`, scanner, indices) } diff --git a/r/R/dataset-format.R b/r/R/dataset-format.R index f00efd0350..acc1a41b02 100644 --- a/r/R/dataset-format.R +++ b/r/R/dataset-format.R @@ -390,7 +390,7 @@ ParquetFragmentScanOptions$create <- function(use_buffered_stream = FALSE, FileWriteOptions <- R6Class("FileWriteOptions", inherit = ArrowObject, public = list( -update = function(table, ...) { +update = function(column_names, ...) { check_additional_args <- function(format, passed_args) { if (format == "parquet") { supported_args <- names(formals(write_parquet)) @@ -437,7 +437,7 @@ FileWriteOptions <- R6Class("FileWriteOptions", if (self$type == "parquet") { dataset___ParquetFileWriteOptions__update( self, - ParquetWriterProperties$create(table, ...), + ParquetWriterProperties$create(column_names, ...), ParquetArrowWriterProperties$create(...) ) } else if (self$type == "ipc") { diff --git a/r/R/dataset-write.R b/r/R/dataset-write.R index d7c73908e7..09b3ebdbe6 100644 --- a/r/R/dataset-write.R +++ b/r/R/dataset-write.R @@ -136,41 +136,88 @@ write_dataset <- function(dataset, if (inherits(dataset, "arrow_dplyr_query")) { # partitioning vars need to be in the `select` schema dataset <- ensure_group_vars(dataset) - } else if (inherits(dataset, "grouped_df")) { -force(partitioning) -# Drop the grouping metadata before writing; we've already consumed it -# now to construct `partitioning` and don't want it in the metadata$r -dataset <- dplyr::ungroup(dataset) + } else { +if (inherits(dataset, "grouped_df")) { + force(partitioning) +
[arrow] branch master updated: ARROW-16201: [R] SafeCallIntoR on 3.4
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 63d2a9c856 ARROW-16201: [R] SafeCallIntoR on 3.4 63d2a9c856 is described below commit 63d2a9c856969a2c05e12ae8857a135bceaf45c1 Author: Jonathan Keane AuthorDate: Thu Apr 14 15:28:44 2022 -0500 ARROW-16201: [R] SafeCallIntoR on 3.4 Disabling the tests for now, 3.4 will no longer be in our support window shortly with the release of 4.2 also skip a number of tests that failed because of `tzone` being non-present on 3.4. Closes #12887 from jonkeane/ARROW-16201 Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/tests/testthat/test-dplyr-funcs-datetime.R | 2 ++ r/tests/testthat/test-dplyr-funcs-type.R | 2 ++ r/tests/testthat/test-safe-call-into-r.R | 1 + 3 files changed, 5 insertions(+) diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R b/r/tests/testthat/test-dplyr-funcs-datetime.R index 79b922f6e2..fc030779ec 100644 --- a/r/tests/testthat/test-dplyr-funcs-datetime.R +++ b/r/tests/testthat/test-dplyr-funcs-datetime.R @@ -16,6 +16,8 @@ # under the License. skip_if(on_old_windows()) +# In 3.4 the lack of tzone attribute causes spurious failures +skip_if_r_version("3.4.4") library(lubridate, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) diff --git a/r/tests/testthat/test-dplyr-funcs-type.R b/r/tests/testthat/test-dplyr-funcs-type.R index 6c9d9ac07a..aa6667420c 100644 --- a/r/tests/testthat/test-dplyr-funcs-type.R +++ b/r/tests/testthat/test-dplyr-funcs-type.R @@ -877,6 +877,8 @@ test_that("as.Date() converts successfully from date, timestamp, integer, char a test_that("format date/time", { skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168 + # In 3.4 the lack of tzone attribute causes spurious failures + skip_if_r_version("3.4.4") times <- tibble( datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Pacific/Marquesas"), NA), diff --git a/r/tests/testthat/test-safe-call-into-r.R b/r/tests/testthat/test-safe-call-into-r.R index e9438de58b..55cb68abdd 100644 --- a/r/tests/testthat/test-safe-call-into-r.R +++ b/r/tests/testthat/test-safe-call-into-r.R @@ -46,6 +46,7 @@ test_that("SafeCallIntoR works within RunWithCapturedR", { }) test_that("SafeCallIntoR errors from the non-R thread", { + skip_if_r_version("3.4.4") skip_on_cran() expect_error(
[arrow] branch master updated: MINOR: [R] Add Dewey + Dragoș as authors
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 76e4f53679 MINOR: [R] Add Dewey + Dragoș as authors 76e4f53679 is described below commit 76e4f53679c5b4bbc1b26b3dd181ec990f7b9223 Author: Jonathan Keane AuthorDate: Thu Apr 14 12:49:31 2022 -0500 MINOR: [R] Add Dewey + Dragoș as authors also, alphabetize the `aut` group by last name Closes #12889 from jonkeane/add-authors Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/DESCRIPTION | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/r/DESCRIPTION b/r/DESCRIPTION index a5fb1ee9a4..46a3eefb68 100644 --- a/r/DESCRIPTION +++ b/r/DESCRIPTION @@ -5,8 +5,10 @@ Authors@R: c( person("Neal", "Richardson", email = "n...@ursalabs.org", role = c("aut", "cre")), person("Ian", "Cook", email = "ianmc...@gmail.com", role = c("aut")), person("Nic", "Crane", email = "thisis...@gmail.com", role = c("aut")), -person("Jonathan", "Keane", email = "jke...@gmail.com", role = c("aut")), +person("Dewey", "Dunnington", role = c("aut"), email = "de...@fishandwhistle.net", comment = c(ORCID = "-0002-9415-4582")), person("Romain", "Fran\u00e7ois", email = "rom...@rstudio.com", role = c("aut"), comment = c(ORCID = "-0002-2444-4226")), +person("Jonathan", "Keane", email = "jke...@gmail.com", role = c("aut")), +person("Drago\u0219", "Moldovan-Gr\u00fcnfeld", email = "dragos.m...@gmail.com", role = c("aut")), person("Jeroen", "Ooms", email = "jer...@berkeley.edu", role = c("aut")), person("Javier", "Luraschi", email = "jav...@rstudio.com", role = c("ctb")), person("Karl", "Dunkle Werner", email = "kar...@users.noreply.github.com", role = c("ctb"), comment = c(ORCID = "-0003-0523-7309")),
[arrow] branch master updated: ARROW-14168: [R] Warn only once about arrow function differences
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 9d24ded7f7 ARROW-14168: [R] Warn only once about arrow function differences 9d24ded7f7 is described below commit 9d24ded7f7d58717c9d78308b0c59ab7a9636006 Author: Edward Visel <1693477+alistair...@users.noreply.github.com> AuthorDate: Wed Apr 13 18:26:37 2022 -0500 ARROW-14168: [R] Warn only once about arrow function differences Addresses [ARROW-14168](https://issues.apache.org/jira/browse/ARROW-14168) by changing `median()` and `quantile()` to warn once, and adjusts the tests accordingly. Closes #12867 from alistaire47/feat/fun-diff-warn-once Lead-authored-by: Edward Visel <1693477+alistair...@users.noreply.github.com> Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/NEWS.md | 1 + r/R/dplyr-summarize.R | 10 +- r/tests/testthat/helper-arrow.R | 14 +++ r/tests/testthat/test-dplyr-summarize.R | 205 +--- 4 files changed, 132 insertions(+), 98 deletions(-) diff --git a/r/NEWS.md b/r/NEWS.md index 1a1f198e0f..2e6a993507 100644 --- a/r/NEWS.md +++ b/r/NEWS.md @@ -27,6 +27,7 @@ * date-time functionality: * Added `difftime` and `as.difftime()` * Added `as.Date()` to convert to date +* `median()` and `quantile()` will warn once about approximate calculations regardless of interactivity. # arrow 7.0.0 diff --git a/r/R/dplyr-summarize.R b/r/R/dplyr-summarize.R index d8e6c46d92..6484d56866 100644 --- a/r/R/dplyr-summarize.R +++ b/r/R/dplyr-summarize.R @@ -106,8 +106,9 @@ register_bindings_aggregate <- function() { # this warning (ARROW-14021) warn( "quantile() currently returns an approximate quantile in Arrow", - .frequency = ifelse(is_interactive(), "once", "always"), - .frequency_id = "arrow.quantile.approximate" + .frequency = "once", + .frequency_id = "arrow.quantile.approximate", + class = "arrow.quantile.approximate" ) list( fun = "tdigest", @@ -120,8 +121,9 @@ register_bindings_aggregate <- function() { # this warning (ARROW-14021) warn( "median() currently returns an approximate median in Arrow", - .frequency = ifelse(is_interactive(), "once", "always"), - .frequency_id = "arrow.median.approximate" + .frequency = "once", + .frequency_id = "arrow.median.approximate", + class = "arrow.median.approximate" ) list( fun = "approximate_median", diff --git a/r/tests/testthat/helper-arrow.R b/r/tests/testthat/helper-arrow.R index 545f2d0440..873bb55712 100644 --- a/r/tests/testthat/helper-arrow.R +++ b/r/tests/testthat/helper-arrow.R @@ -56,6 +56,20 @@ test_that <- function(what, code) { }) } +# backport of 4.0.0 implementation +if (getRversion() < "4.0.0") { + suppressWarnings <- function(expr, classes = "warning") { +withCallingHandlers( + expr, + warning = function(w) { +if (inherits(w, classes)) { + invokeRestart("muffleWarning") +} + } +) + } +} + # Wrapper to run tests that only touch R code even when the C++ library isn't # available (so that at least some tests are run on those platforms) r_only <- function(code) { diff --git a/r/tests/testthat/test-dplyr-summarize.R b/r/tests/testthat/test-dplyr-summarize.R index efadb2722d..73e3312ee0 100644 --- a/r/tests/testthat/test-dplyr-summarize.R +++ b/r/tests/testthat/test-dplyr-summarize.R @@ -17,7 +17,13 @@ skip_if(on_old_windows()) -withr::local_options(list(arrow.summarise.sort = TRUE)) +withr::local_options(list( + arrow.summarise.sort = TRUE, + rlib_warning_verbosity = "verbose", + # This prevents the warning in `summarize()` about having grouped output without + # also specifying what to do with `.groups` + dplyr.summarise.inform = FALSE +)) library(dplyr, warn.conflicts = FALSE) library(stringr) @@ -296,52 +302,56 @@ test_that("median()", { # output of type float64. The calls to median(int, ...) in the tests below # are enclosed in as.double() to work around this known difference. - # Use old testthat behavior here so we don't have to assert the same warning - # over and over - local_edition(2) - # with groups - compare_dplyr_binding( -.input %>% - group_by(some_grouping) %>% - summarize( -med_dbl = median(dbl), -med_int = as.double(median(int)), -med_dbl_narmf = median(dbl, FALSE), -med_int_narmf = as.double(median(int, na.rm = FALSE)), -med
[arrow] branch master updated: ARROW-16165: [CI][Archery] Fix nightly query to crossbow to send reports
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 8f5936f4bc ARROW-16165: [CI][Archery] Fix nightly query to crossbow to send reports 8f5936f4bc is described below commit 8f5936f4bc5b02b3b3516831bbde23b55c213249 Author: Raúl Cumplido AuthorDate: Tue Apr 12 09:59:09 2022 -0500 ARROW-16165: [CI][Archery] Fix nightly query to crossbow to send reports This PR fixes the current issue with our nightly reports as seen here: https://github.com/ursacomputing/crossbow/runs/5840285120?check_suite_focus=true) The issue could be reproduced using the prefix that crossbow reports uses: ``` job_prefix=nightly-${{ inputs.report_type }}-$(date -I) job_id=$(archery crossbow latest-prefix ${job_prefix}) ``` Before the fix, when using the following query: ``` $ archery crossbow latest-prefix --no-fetch nightly-packaging-2022-04-10 Traceback (most recent call last): File "/home/raulcd/open_source/pyarrow-dev/bin/archery", line 33, in sys.exit(load_entry_point('archery', 'console_scripts', 'archery')()) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func return f(get_current_context().obj, *args, **kwargs) File "/home/raulcd/open_source/arrow/dev/archery/archery/crossbow/cli.py", line 237, in latest_prefix latest = queue.latest_for_prefix(prefix) File "/home/raulcd/open_source/arrow/dev/archery/archery/crossbow/core.py", line 568, in latest_for_prefix latest_id += "-0" TypeError: unsupported operand type(s) for +=: 'int' and 'str' ``` After the fix: ``` $ archery crossbow latest-prefix --no-fetch nightly-packaging-2022-04-10 nightly-packaging-2022-04-10-0 $ archery crossbow latest-prefix --no-fetch nightly-packaging nightly-packaging-2022-04-11-0 ``` Closes #12862 from raulcd/ARROW-16165 Authored-by: Raúl Cumplido Signed-off-by: Jonathan Keane --- dev/archery/archery/crossbow/core.py| 9 - dev/archery/archery/crossbow/tests/test_core.py | 26 - 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/dev/archery/archery/crossbow/core.py b/dev/archery/archery/crossbow/core.py index c41582ad40..1ad4763e29 100644 --- a/dev/archery/archery/crossbow/core.py +++ b/dev/archery/archery/crossbow/core.py @@ -542,6 +542,12 @@ class Queue(Repo): latest = -1 return latest +def _prefix_contains_date(self, prefix): +prefix_date_pattern = re.compile(r'[\w\/-]*-(\d+)-(\d+)-(\d+)') +match_prefix = prefix_date_pattern.match(prefix) +if match_prefix: +return match_prefix.group(0)[-10:] + def _latest_prefix_date(self, prefix): pattern = re.compile(r'[\w\/-]*{}-(\d+)-(\d+)-(\d+)'.format(prefix)) matches = list(filter(None, map(pattern.match, self.repo.branches))) @@ -559,7 +565,8 @@ class Queue(Repo): return '{}-{}'.format(prefix, latest_id + 1) def latest_for_prefix(self, prefix): -if prefix.startswith("nightly"): +prefix_date = self._prefix_contains_date(prefix) +if prefix.startswith("nightly") and not prefix_date: latest_id = self._latest_prefix_date(prefix) if not latest_id: raise RuntimeError( diff --git a/dev/archery/archery/crossbow/tests/test_core.py b/dev/archery/archery/crossbow/tests/test_core.py index 847aae2240..3d538b89b2 100644 --- a/dev/archery/archery/crossbow/tests/test_core.py +++ b/dev/archery/archery/crossbow/tests/test_core.py @@ -16,9 +16,10 @@ # unde
[arrow] branch master updated: ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and `decimal_date()`
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 80bba5cbde ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and `decimal_date()` 80bba5cbde is described below commit 80bba5cbdef77e809a7b9bfec36eb5d6a61f0b5d Author: Dragoș Moldovan-Grünfeld AuthorDate: Mon Apr 11 13:16:13 2022 -0500 ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and `decimal_date()` This would allow the following operations: ``` r library(dplyr, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) library(arrow, warn.conflicts = FALSE) test_df <- tibble( a = c(2007.38998954347, 1970.77732069883, 2020.96061799722, 2009.43465948477, 1975.71251467871, NA), b = as.POSIXct( c("2007-05-23 08:18:30", "1970-10-11 17:19:45", "2020-12-17 14:04:06", "2009-06-08 15:37:01", "1975-09-18 01:37:42", NA) ) ) test_df %>% mutate( decimal_date_from_date = decimal_date(b), date_from_decimal = date_decimal(a) ) #> # A tibble: 6 × 4 #> a b decimal_date_from_date date_from_decimal #> #> 1 2007. 2007-05-23 08:18:30 2007. 2007-05-23 08:18:30 #> 2 1971. 1970-10-11 17:19:45 1971. 1970-10-11 17:19:45 #> 3 2021. 2020-12-17 14:04:06 2021. 2020-12-17 14:04:06 #> 4 2009. 2009-06-08 15:37:01 2009. 2009-06-08 15:37:01 #> 5 1976. 1975-09-18 01:37:42 1976. 1975-09-18 01:37:42 #> 6 NA NA NA NA test_df %>% arrow_table() %>% mutate( decimal_date_from_date = decimal_date(b), date_from_decimal = date_decimal(a) ) %>% collect() #> # A tibble: 6 × 4 #> a b decimal_date_from_date date_from_decimal #> #> 1 2007. 2007-05-23 08:18:30 2007. 2007-05-23 08:18:30 #> 2 1971. 1970-10-11 17:19:45 1971. 1970-10-11 17:19:45 #> 3 2021. 2020-12-17 14:04:06 2021. 2020-12-17 14:04:06 #> 4 2009. 2009-06-08 15:37:01 2009. 2009-06-08 15:37:01 #> 5 1976. 1975-09-18 01:37:42 1976. 1975-09-18 01:37:42 #> 6 NA NA NA NA ``` Created on 2022-03-28 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1) Closes #12707 from dragosmg/decimal_dates Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- r/NEWS.md| 7 +++-- r/R/dplyr-funcs-datetime.R | 42 r/tests/testthat/test-dplyr-funcs-datetime.R | 31 +++- 3 files changed, 76 insertions(+), 4 deletions(-) diff --git a/r/NEWS.md b/r/NEWS.md index 0a7d30d2a3..1a1f198e0f 100644 --- a/r/NEWS.md +++ b/r/NEWS.md @@ -22,10 +22,11 @@ * `read_csv_arrow()`'s readr-style type `T` is now mapped to `timestamp(unit = "ns")` instead of `timestamp(unit = "s")`. * `lubridate`: * component extraction functions: `tz()` (timezone), `semester()` (semester), `dst()` (daylight savings time indicator), `date()` (extract date), `epiyear()` (epiyear), improvements to `month()`, which now works with integer inputs. - * `make_date()` & `make_datetime()` + `ISOdatetime()` & `ISOdate()` to create date-times from numeric representations. + * Added `make_date()` & `make_datetime()` + `ISOdatetime()` & `ISOdate()` to create date-times from numeric representations. + * Added `decimal_date()` and `date_decimal()` * date-time functionality: - * `difftime` and `as.difftime()` - * `as.Date()` to convert to date + * Added `difftime` and `as.difftime()` + * Added `as.Date()` to convert to date # arrow 7.0.0 diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R index 754d02a436..1ca485f56e 100644 --- a/r/R/dplyr-funcs-datetime.R +++ b/r/R/dplyr-funcs-datetime.R @@ -270,6 +270,20 @@ register_bindings_duration <- function() { time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC"))) } +# if time1 or time2 are timestamps they cannot be expressed in "s" /seconds +# otherwise they cannot be added subtracted with durations +# TODO delete the casting to "us" once +# https://issues.apache.org/jira/browse/ARROW-16060 is solved +if (inherits(time1, "Expression&quo
[arrow] branch master updated: ARROW-14442: [R] fix behaviour when converting timestamps with "" as tzone
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 633687c1e6 ARROW-14442: [R] fix behaviour when converting timestamps with "" as tzone 633687c1e6 is described below commit 633687c1e6f940c78986af206a4bb2a478f25906 Author: Dragoș Moldovan-Grünfeld AuthorDate: Mon Apr 11 12:15:04 2022 -0500 ARROW-14442: [R] fix behaviour when converting timestamps with "" as tzone Closes #12240 from dragosmg/timestampts_missing_timezone Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- r/src/type_infer.cpp | 6 -- r/tests/testthat/test-Array.R| 27 --- r/tests/testthat/test-dplyr-funcs-datetime.R | 4 3 files changed, 28 insertions(+), 9 deletions(-) diff --git a/r/src/type_infer.cpp b/r/src/type_infer.cpp index 2568757aa2..75b1e85c42 100644 --- a/r/src/type_infer.cpp +++ b/r/src/type_infer.cpp @@ -72,7 +72,8 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { } else if (Rf_inherits(x, "POSIXct")) { auto tzone_sexp = Rf_getAttrib(x, symbols::tzone); if (Rf_isNull(tzone_sexp)) { - return timestamp(TimeUnit::MICRO); + auto systzone_sexp = cpp11::package("base")["Sys.timezone"]; + return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(systzone_sexp(), 0))); } else { return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(tzone_sexp, 0))); } @@ -88,7 +89,8 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "POSIXct")) { auto tzone_sexp = Rf_getAttrib(x, symbols::tzone); if (Rf_isNull(tzone_sexp)) { - return timestamp(TimeUnit::MICRO); + auto systzone_sexp = cpp11::package("base")["Sys.timezone"]; + return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(systzone_sexp(), 0))); } else { return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(tzone_sexp, 0))); } diff --git a/r/tests/testthat/test-Array.R b/r/tests/testthat/test-Array.R index 15d6d79247..2f75efb3d6 100644 --- a/r/tests/testthat/test-Array.R +++ b/r/tests/testthat/test-Array.R @@ -260,11 +260,11 @@ test_that("array supports POSIXct (ARROW-3340)", { expect_array_roundtrip(times2, timestamp("us", "US/Eastern")) }) -test_that("array supports POSIXct without timezone", { - # Make sure timezone is not set +test_that("array uses local timezone for POSIXct without timezone", { withr::with_envvar(c(TZ = ""), { times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10 -expect_array_roundtrip(times, timestamp("us", "")) +expect_equal(attr(times, "tzone"), NULL) +expect_array_roundtrip(times, timestamp("us", Sys.timezone())) # Also test the INTSXP code path skip("Ingest_POSIXct only implemented for REALSXP") @@ -272,6 +272,27 @@ test_that("array supports POSIXct without timezone", { attributes(times_int) <- attributes(times) expect_array_roundtrip(times_int, timestamp("us", "")) }) + + # If there is a timezone set, we record that + withr::with_timezone("Pacific/Marquesas", { +times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10 +expect_equal(attr(times, "tzone"), "Pacific/Marquesas") +expect_array_roundtrip(times, timestamp("us", "Pacific/Marquesas")) + +times_with_tz <- strptime( + "2019-02-03 12:34:56", + format = "%Y-%m-%d %H:%M:%S", + tz = "Asia/Katmandu") + 1:10 +expect_equal(attr(times, "tzone"), "Asia/Katmandu") +expect_array_roundtrip(times, timestamp("us", "Asia/Katmandu")) + }) + + # and although the TZ is NULL in R, we set it to the Sys.timezone() + withr::with_timezone(NA, { +times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 1:10 +expect_equal(attr(times, "tzone"), NULL) +expect_array_roundtrip(times, timestamp("us", Sys.timezone())) + }) }) test_that("Timezone handling in Arrow roundtrip (ARROW-3543)", { diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R b/r/tests/testthat/test-dplyr-funcs-datetime.R index 16e4958f1c..c901742f65 100644 --- a/r/tests/testthat/test-dplyr-funcs-datetime.R +++ b/r/tests/testthat/test-dplyr-funcs-datetime.R @@ -693,7 +693,6 @@ test_that("extract yday from date", { }) test_that("leap_year mirror lubridate", { - compare_dplyr_binding( .input %>% mutate(x = leap_year(date)) %>% @@ -721,7 +720,6 @@ test_that("leap_year mirror lubridate", { )) ) ) - }) test_that("am/pm mirror lubridate", { @@ -741,10 +739,8 @@ test_that("am/pm mirror lubridate", { ), format = "%Y-%m-%d %H:%M:%S" ) - ) ) - }) test_that("extract tz", {
[arrow] branch master updated: ARROW-16156: [R] Clarify warning message for features not turned on in .onAttach()
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 961ec771b9 ARROW-16156: [R] Clarify warning message for features not turned on in .onAttach() 961ec771b9 is described below commit 961ec771b9bb84b71be3af398e7b3df3e10f291b Author: Dewey Dunnington AuthorDate: Fri Apr 8 17:22:16 2022 -0500 ARROW-16156: [R] Clarify warning message for features not turned on in .onAttach() After ARROW-15818 (#12564) we get an extra message on package load because "engine" was added to `arrow_info()$capabilities` and few if any users will have this turned on for at least the next release: ```r library(arrow) #> See arrow_info() for available features ``` This PR adds "engine" to the list of features we don't message users about and clarifies the message so that it's more clear why it's being shown: ```r library(arrow) #> Some features of Arrow C++ are turned off. Run `arrow_info()` for more information. ``` Closes #12842 from paleolimbot/r-onattach Lead-authored-by: Dewey Dunnington Co-authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- r/R/arrow-package.R | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 896363a478..3c810bb8f2 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -107,7 +107,12 @@ # # Let's print a message if some are off if (some_features_are_off(features)) { -packageStartupMessage("See arrow_info() for available features") +packageStartupMessage( + paste( +"Some features are not enabled in this build of Arrow.", +"Run `arrow_info()` for more information." + ) +) } }) } @@ -264,7 +269,7 @@ arrow_info <- function() { some_features_are_off <- function(features) { # `features` is a named logical vector (as in arrow_info()$capabilities) # Let's exclude some less relevant ones - blocklist <- c("lzo", "bz2", "brotli") + blocklist <- c("lzo", "bz2", "brotli", "engine") # Return TRUE if any of the other features are FALSE !all(features[setdiff(names(features), blocklist)]) }
[arrow] branch master updated: MINOR: [R] Fix compiler warning/CMD check NOTE when compiling with ARROW_R_WITH_ENGINE
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new d197ad31c3 MINOR: [R] Fix compiler warning/CMD check NOTE when compiling with ARROW_R_WITH_ENGINE d197ad31c3 is described below commit d197ad31c3d7c16ecee74cb76a71ce397e905b3b Author: Dewey Dunnington AuthorDate: Fri Apr 8 12:24:17 2022 -0500 MINOR: [R] Fix compiler warning/CMD check NOTE when compiling with ARROW_R_WITH_ENGINE After ARROW-16033 (#12721) we get this compiler warning when compiling with `ARROW_R_WITH_ENGINE`: ``` compute-exec.cpp:304:17: warning: 'Init' overrides a member function but is not marked 'override' [-Winconsistent-missing-override] arrow::Status Init(const std::shared_ptr& schema) { ^ /Users/deweydunnington/.r-arrow-dev-build/dist/include/arrow/compute/exec/options.h:153:18: note: overridden virtual function is here virtual Status Init(const std::shared_ptr& schema) = 0; ^ 1 warning generated. ``` This PR just adds the requisite `override`. Closes #12823 from paleolimbot/r-minor-override Authored-by: Dewey Dunnington Signed-off-by: Jonathan Keane --- r/src/compute-exec.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/r/src/compute-exec.cpp b/r/src/compute-exec.cpp index a1a679144d..e7d8df55bb 100644 --- a/r/src/compute-exec.cpp +++ b/r/src/compute-exec.cpp @@ -301,7 +301,7 @@ class AccumulatingConsumer : public compute::SinkNodeConsumer { public: const std::vector>& batches() { return batches_; } - arrow::Status Init(const std::shared_ptr& schema) { + arrow::Status Init(const std::shared_ptr& schema) override { schema_ = schema; return arrow::Status::OK(); }
[arrow] branch master updated: ARROW-15471: [R] ExtensionType support in R
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 489aada557 ARROW-15471: [R] ExtensionType support in R 489aada557 is described below commit 489aada557f267f4b9745b039034e9f5b0e1f485 Author: Dewey Dunnington AuthorDate: Fri Apr 8 10:52:12 2022 -0500 ARROW-15471: [R] ExtensionType support in R This PR implements extension type support and registration in the R bindings (as has been possible in the Python bindings for some time). The details still need to be worked out, but we at least have a working pattern: ``` r library(arrow, warn.conflicts = FALSE) library(R6) SomeExtensionTypeSubclass <- R6Class( "SomeExtensionTypeSubclass", inherit = arrow:::ExtensionType, public = list( some_custom_method = function() { private$some_custom_field }, .Deserialize = function(storage_type, extension_name, extension_metadata) { private$some_custom_field <- head(extension_metadata, 5) } ), private = list( some_custom_field = NULL ) ) SomeExtensionArraySubclass <- R6Class( "SomeExtensionArraySubclass", inherit = arrow:::ExtensionArray, public = list( some_custom_method = function() { self$type$some_custom_method() } ) ) type <- arrow:::MakeExtensionType( int32(), "some_extension_subclass", charToRaw("some custom metadata"), type_class = SomeExtensionTypeSubclass, array_class = SomeExtensionArraySubclass ) arrow:::RegisterExtensionType(type) # survives the C API round trip ptr_type <- arrow:::allocate_arrow_schema() type$export_to_c(ptr_type) type2 <- arrow:::DataType$import_from_c(ptr_type) type2 #> SomeExtensionTypeSubclass #> SomeExtensionTypeSubclass type2$some_custom_method() #> [1] 73 6f 6d 65 20 (array <- type$WrapArray(Array$create(1:10))) #> SomeExtensionArraySubclass #> > #> [ #> 1, #> 2, #> 3, #> 4, #> 5, #> 6, #> 7, #> 8, #> 9, #> 10 #> ] array$some_custom_method() #> [1] 73 6f 6d 65 20 ptr_array <- arrow:::allocate_arrow_array() array$export_to_c(ptr_array, ptr_type) (array2 <- Array$import_from_c(ptr_array, ptr_type)) #> SomeExtensionArraySubclass #> > #> [ #> 1, #> 2, #> 3, #> 4, #> 5, #> 6, #> 7, #> 8, #> 9, #> 10 #> ] arrow:::delete_arrow_schema(ptr_type) arrow:::delete_arrow_array(ptr_array) ``` Created on 2022-02-18 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1) Closes #12467 from paleolimbot/r-extension-type Authored-by: Dewey Dunnington Signed-off-by: Jonathan Keane --- r/DESCRIPTION| 1 + r/NAMESPACE | 9 + r/R/arrow-package.R | 5 + r/R/arrowExports.R | 36 +++ r/R/extension.R | 545 +++ r/_pkgdown.yml | 5 + r/man/ExtensionArray.Rd | 23 ++ r/man/ExtensionType.Rd | 48 +++ r/man/new_extension_type.Rd | 167 +++ r/man/vctrs_extension_array.Rd | 50 r/src/array.cpp | 2 + r/src/array_to_vector.cpp| 33 +++ r/src/arrowExports.cpp | 150 ++ r/src/datatype.cpp | 2 + r/src/extension-impl.cpp | 198 + r/src/extension.h| 75 + r/tests/testthat/_snaps/extension.md | 10 + r/tests/testthat/test-extension.R| 345 ++ 18 files changed, 1704 insertions(+) diff --git a/r/DESCRIPTION b/r/DESCRIPTION index 36a55c05b2..a5fb1ee9a4 100644 --- a/r/DESCRIPTION +++ b/r/DESCRIPTION @@ -108,6 +108,7 @@ Collate: 'table.R' 'dplyr.R' 'duckdb.R' +'extension.R' 'feather.R' 'field.R' 'filesystem.R' diff --git a/r/NAMESPACE b/r/NAMESPACE index f32e73f537..da43a3f511 100644 --- a/r/NAMESPACE +++ b/r/NAMESPACE @@ -134,6 +134,8 @@ export(DictionaryArray) export(DirectoryPartitioning) export(DirectoryPartitioningFactory) export(Expression) +export(ExtensionArray) +export(ExtensionType) export(FeatherReader) export(Field) export(FileFormat) @@ -267,6 +269,8 @@ export(match_arrow) export(matches) export(mmap_create) export(mmap_open) +export(new_extension_array) +export(new_extension_type) export(null) expo
[arrow] branch master updated: ARROW-16038: [R] different behavior from dplyr when mutate's `.keep` option is set
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new b8299436c8 ARROW-16038: [R] different behavior from dplyr when mutate's `.keep` option is set b8299436c8 is described below commit b8299436c8b1a2d7cd3d6e019a2b750893a3af87 Author: SAm Albers AuthorDate: Thu Apr 7 16:07:34 2022 -0500 ARROW-16038: [R] different behavior from dplyr when mutate's `.keep` option is set This PR does two things to match some dplyr behaviour around column order: 1) Mimics dplyr implementation of `mutate(..., .keep = "none")` to append new columns after the existing columns (if suggested) as [per](https://github.com/tidyverse/dplyr/issues/6086) 2) As per this [discussion](https://github.com/tidyverse/dplyr/issues/6086), this required a bespoke approach to `transmute` as it not simply a wrapper for `mutate(..., .keep = "none")`. This cascades into needing to catch a couple edge cases. I have also added some tests which will test for this behaviour. Closes #12818 from boshek/mutate-keep Authored-by: SAm Albers Signed-off-by: Jonathan Keane --- r/NAMESPACE | 1 + r/R/arrow-package.R | 2 +- r/R/dplyr-mutate.R | 17 +++-- r/tests/testthat/test-dplyr-mutate.R | 34 +- 4 files changed, 46 insertions(+), 8 deletions(-) diff --git a/r/NAMESPACE b/r/NAMESPACE index 7cb89b0a53..f32e73f537 100644 --- a/r/NAMESPACE +++ b/r/NAMESPACE @@ -331,6 +331,7 @@ importFrom(rlang,":=") importFrom(rlang,.data) importFrom(rlang,abort) importFrom(rlang,as_function) +importFrom(rlang,as_label) importFrom(rlang,as_quosure) importFrom(rlang,call2) importFrom(rlang,caller_env) diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 2fab03d08c..256bc7aefa 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -23,7 +23,7 @@ #' @importFrom rlang eval_tidy new_data_mask syms env new_environment env_bind set_names exec #' @importFrom rlang is_bare_character quo_get_expr quo_get_env quo_set_expr .data seq2 is_interactive #' @importFrom rlang expr caller_env is_character quo_name is_quosure enexpr enexprs as_quosure -#' @importFrom rlang is_list call2 is_empty as_function +#' @importFrom rlang is_list call2 is_empty as_function as_label #' @importFrom tidyselect vars_pull vars_rename vars_select eval_select #' @useDynLib arrow, .registration = TRUE #' @keywords internal diff --git a/r/R/dplyr-mutate.R b/r/R/dplyr-mutate.R index 986f29cc1d..07802f8c83 100644 --- a/r/R/dplyr-mutate.R +++ b/r/R/dplyr-mutate.R @@ -94,7 +94,10 @@ mutate.arrow_dplyr_query <- function(.data, # Respect .keep if (.keep == "none") { -.data$selected_columns <- .data$selected_columns[new_vars] +## for consistency with dplyr, this appends new columns after existing columns +## by specifying the order +new_cols_last <- c(intersect(old_vars, new_vars), setdiff(new_vars, old_vars)) +.data$selected_columns <- .data$selected_columns[new_cols_last] } else if (.keep != "all") { # "used" or "unused" used_vars <- unlist(lapply(exprs, all.vars), use.names = FALSE) @@ -112,7 +115,17 @@ mutate.Dataset <- mutate.ArrowTabular <- mutate.RecordBatchReader <- mutate.arro transmute.arrow_dplyr_query <- function(.data, ...) { dots <- check_transmute_args(...) - dplyr::mutate(.data, !!!dots, .keep = "none") + has_null <- map_lgl(dots, quo_is_null) + .data <- dplyr::mutate(.data, !!!dots, .keep = "none") + if (is_empty(dots) | any(has_null)) { +return(.data) + } + + ## keeping with: https://github.com/tidyverse/dplyr/issues/6086 + cur_exprs <- map_chr(dots, as_label) + transmute_order <- names(cur_exprs) + transmute_order[!nzchar(transmute_order)] <- cur_exprs[!nzchar(transmute_order)] + dplyr::select(.data, all_of(transmute_order)) } transmute.Dataset <- transmute.ArrowTabular <- transmute.RecordBatchReader <- transmute.arrow_dplyr_query diff --git a/r/tests/testthat/test-dplyr-mutate.R b/r/tests/testthat/test-dplyr-mutate.R index 61d9edac1e..a746335940 100644 --- a/r/tests/testthat/test-dplyr-mutate.R +++ b/r/tests/testthat/test-dplyr-mutate.R @@ -74,6 +74,16 @@ test_that("transmute", { ) }) +test_that("transmute respect bespoke dplyr implementation", { + ## see: https://github.com/tidyverse/dplyr/issues/6086 + compare_dplyr_binding( +.input %>% + transmute(dbl, int = int + 6L) %>% + collect(), +tbl + ) +}) + test_that("transmute() with NULL inputs", { compare_dplyr_binding( .input %>% @@ -92,6 +102,20 @@ test_that
[arrow] branch master updated: ARROW-15841: [R] Implement SafeCallIntoR to safely call the R API from another thread
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new e110eac71a ARROW-15841: [R] Implement SafeCallIntoR to safely call the R API from another thread e110eac71a is described below commit e110eac71aae63041a595fc1c8cc51960ba97f06 Author: Dewey Dunnington AuthorDate: Thu Apr 7 11:50:28 2022 -0500 ARROW-15841: [R] Implement SafeCallIntoR to safely call the R API from another thread This is a very WIP draft that currently just sketches a few things related to calling into R from other threads. Some code to get started: ``` r arrow:::TestSafeCallIntoR( list( function() "string one", function() "string two" ) ) #> [1] "string one" "string two" arrow:::TestSafeCallIntoR( list( function() stop("This is an error!") ) ) #> Error in (function () : This is an error! ``` Closes #12558 from paleolimbot/r-safe-call-into Authored-by: Dewey Dunnington Signed-off-by: Jonathan Keane --- r/R/arrow-package.R | 5 ++ r/R/arrowExports.R | 8 ++ r/src/arrowExports.cpp | 33 +++ r/src/safe-call-into-r-impl.cpp | 89 +++ r/src/safe-call-into-r.h | 145 +++ r/tests/testthat/test-safe-call-into-r.R | 60 + 6 files changed, 340 insertions(+) diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R index 509382e5da..2fab03d08c 100644 --- a/r/R/arrow-package.R +++ b/r/R/arrow-package.R @@ -31,6 +31,11 @@ #' @importFrom vctrs s3_register vec_size vec_cast vec_unique .onLoad <- function(...) { + if (arrow_available()) { +# Make sure C++ knows on which thread it is safe to call the R API +InitializeMainRThread() + } + dplyr_methods <- paste0( "dplyr::", c( diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R index f43ef730ca..5ef6312196 100644 --- a/r/R/arrowExports.R +++ b/r/R/arrowExports.R @@ -1732,6 +1732,14 @@ ipc___RecordBatchStreamWriter__Open <- function(stream, schema, use_legacy_forma .Call(`_arrow_ipc___RecordBatchStreamWriter__Open`, stream, schema, use_legacy_format, metadata_version) } +InitializeMainRThread <- function() { + invisible(.Call(`_arrow_InitializeMainRThread`)) +} + +TestSafeCallIntoR <- function(r_fun_that_returns_a_string, opt) { + .Call(`_arrow_TestSafeCallIntoR`, r_fun_that_returns_a_string, opt) +} + Array__GetScalar <- function(x, i) { .Call(`_arrow_Array__GetScalar`, x, i) } diff --git a/r/src/arrowExports.cpp b/r/src/arrowExports.cpp index 45a883321d..0a29ed0872 100644 --- a/r/src/arrowExports.cpp +++ b/r/src/arrowExports.cpp @@ -6822,6 +6822,37 @@ extern "C" SEXP _arrow_ipc___RecordBatchStreamWriter__Open(SEXP stream_sexp, SEX } #endif +// safe-call-into-r-impl.cpp +#if defined(ARROW_R_WITH_ARROW) +void InitializeMainRThread(); +extern "C" SEXP _arrow_InitializeMainRThread(){ +BEGIN_CPP11 + InitializeMainRThread(); + return R_NilValue; +END_CPP11 +} +#else +extern "C" SEXP _arrow_InitializeMainRThread(){ + Rf_error("Cannot call InitializeMainRThread(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries. "); +} +#endif + +// safe-call-into-r-impl.cpp +#if defined(ARROW_R_WITH_ARROW) +std::string TestSafeCallIntoR(cpp11::function r_fun_that_returns_a_string, std::string opt); +extern "C" SEXP _arrow_TestSafeCallIntoR(SEXP r_fun_that_returns_a_string_sexp, SEXP opt_sexp){ +BEGIN_CPP11 + arrow::r::Input::type r_fun_that_returns_a_string(r_fun_that_returns_a_string_sexp); + arrow::r::Input::type opt(opt_sexp); + return cpp11::as_sexp(TestSafeCallIntoR(r_fun_that_returns_a_string, opt)); +END_CPP11 +} +#else +extern "C" SEXP _arrow_TestSafeCallIntoR(SEXP r_fun_that_returns_a_string_sexp, SEXP opt_sexp){ + Rf_error("Cannot call TestSafeCallIntoR(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries. "); +} +#endif + // scalar.cpp #if defined(ARROW_R_WITH_ARROW) std::shared_ptr Array__GetScalar(const std::shared_ptr& x, int64_t i); @@ -8146,6 +8177,8 @@ static const R_CallMethodDef CallEntries[] = { { "_arrow_ipc___RecordBatchWriter__Close", (DL_FUNC) &_arrow_ipc___RecordBatchWriter__Close, 1}, { "_arrow_ipc___RecordBatchFileWriter__Open", (DL_FUNC) &_arrow_ipc___RecordBatchFileWriter__Open, 4}, { "_arrow_ipc___RecordBatchStreamWriter__Open", (DL_FUNC) &_arrow_ipc___RecordBatchStreamW
[arrow] branch master updated (a1a255b -> a1f32fa)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a1a255b MINOR: [R] add some verbosity to homebrew tests add a1f32fa MINOR: [R] Avoid {glue}'s whitespace trimming No new revisions were added by this update. Summary of changes: r/data-raw/codegen.R | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
[arrow] branch master updated (d4798ef -> a1a255b)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d4798ef ARROW-16061: [R] [CI] Speed up windows 3.6 builds add a1a255b MINOR: [R] add some verbosity to homebrew tests No new revisions were added by this update. Summary of changes: dev/tasks/r/github.macos.brew.yml | 1 + 1 file changed, 1 insertion(+)
[arrow] branch master updated (4a90e39 -> d4798ef)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 4a90e39 ARROW-16078: Upgrade bundled zlib to 1.2.12 add d4798ef ARROW-16061: [R] [CI] Speed up windows 3.6 builds No new revisions were added by this update. Summary of changes: .github/workflows/r.yml | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-)
[arrow] branch master updated (64560af -> ba04e7f)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 64560af ARROW-16053: [C++][FlightRPC] Fix flaky test TestAuthHandler.FailUnauthenticatedCalls add ba04e7f ARROW-15659 [R] strptime should return NA (not error) with format mismatch No new revisions were added by this update. Summary of changes: r/R/dplyr-funcs-datetime.R | 2 +- r/tests/testthat/test-dplyr-funcs-datetime.R | 42 2 files changed, 43 insertions(+), 1 deletion(-)
[arrow] branch master updated (6f9b07a -> 4d0436a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6f9b07a ARROW-15975: [C++] Document type traits and inline visitors add 4d0436a ARROW-15818: [R] Implement initial Substrait consumer in the R bindings No new revisions were added by this update. Summary of changes: r/NAMESPACE | 1 + r/R/arrow-package.R | 9 r/R/arrowExports.R | 12 + r/R/query-engine.R | 13 + r/configure | 8 +++ r/configure.win | 9 +++- r/data-raw/codegen.R | 2 +- r/man/arrow_available.Rd | 3 ++ r/src/arrowExports.cpp | 62 ++- r/src/compute-exec.cpp | 98 r/tests/testthat/test-query-engine.R | 63 +++ 11 files changed, 276 insertions(+), 4 deletions(-) create mode 100644 r/tests/testthat/test-query-engine.R
[arrow] branch master updated (ad7380e -> b781710)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ad7380e ARROW-15857: [R] rhub/fedora-clang-devel fails to install 'sass' (rmarkdown dependency) add b781710 ARROW-15947: [R] rename_with s3 method for arrow_dplyr_query No new revisions were added by this update. Summary of changes: r/NAMESPACE | 1 + r/R/arrow-package.R | 4 +-- r/R/dplyr-select.R | 7 + r/tests/testthat/test-dplyr-select.R | 54 ++-- 4 files changed, 61 insertions(+), 5 deletions(-)
[arrow] branch master updated (6a0770c -> ad7380e)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6a0770c ARROW-15665: [C++] Fix error_is_null in strptime with invalid inputs add ad7380e ARROW-15857: [R] rhub/fedora-clang-devel fails to install 'sass' (rmarkdown dependency) No new revisions were added by this update. Summary of changes: ci/scripts/r_docker_configure.sh | 3 +++ 1 file changed, 3 insertions(+)
[arrow] branch master updated (919d113 -> f4dfd6c)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 919d113 ARROW-13564: [Dev] Check individual commit messages for "Co-authored-by:" tags when integrating a pull request add f4dfd6c ARROW-13168: [C++][R] Enable runtime timezone database for Windows No new revisions were added by this update. Summary of changes: .github/workflows/cpp.yml | 6 +++ ci/appveyor-cpp-setup.bat | 14 ++ ..._cgo_python_test.sh => download_tz_database.sh} | 29 +++- .../arrow/compute/kernels/scalar_cast_string.cc| 7 --- cpp/src/arrow/compute/kernels/scalar_cast_test.cc | 43 ++ .../arrow/compute/kernels/scalar_temporal_test.cc | 19 +--- .../arrow/compute/kernels/scalar_temporal_unary.cc | 13 -- cpp/src/arrow/config.cc| 28 cpp/src/arrow/config.h | 18 cpp/src/arrow/public_api_test.cc | 44 +++ cpp/src/arrow/testing/util.cc | 20 + cpp/src/arrow/testing/util.h | 8 docs/source/cpp/api/support.rst| 10 + docs/source/cpp/build_system.rst | 23 ++ docs/source/developers/cpp/windows.rst | 9 r/DESCRIPTION | 1 + r/R/arrow-package.R| 8 r/R/arrowExports.R | 4 ++ r/R/dplyr-funcs-datetime.R | 15 +-- r/src/arrowExports.cpp | 17 r/src/config.cpp | 13 ++ r/tests/testthat/test-dplyr-funcs-datetime.R | 51 +- 22 files changed, 281 insertions(+), 119 deletions(-) copy ci/scripts/{go_cgo_python_test.sh => download_tz_database.sh} (65%) mode change 100755 => 100644
[arrow] branch master updated (6ab947b -> d327f69)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6ab947b ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods add d327f69 ARROW-15814: [R][DOCS] Improve documentation for cast() No new revisions were added by this update. Summary of changes: r/R/type.R | 22 ++ r/man/data-type.Rd | 22 ++ 2 files changed, 44 insertions(+)
[arrow] branch master updated (bfa3bca -> 5bd4d8e)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from bfa3bca ARROW-15313: [C++][Java][FlightRPC] Implement type info method to flight-sql add 5bd4d8e ARROW-16007: [R] grepl bindings return FALSE for NA inputs No new revisions were added by this update. Summary of changes: r/R/dplyr-funcs-string.R | 44 --- r/tests/testthat/test-dplyr-funcs-string.R | 58 ++ 2 files changed, 83 insertions(+), 19 deletions(-)
[arrow] branch master updated (a17137f -> e83ef42)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a17137f ARROW-15921: [Format][FlightRPC][C++][Java] Clarify interpretation of FlightEndpoint.locations add e83ef42 ARROW-15098 [R] Add binding for `lubridate::duration()` and/or `as.difftime()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 64 + r/R/dplyr-funcs.R| 1 + r/tests/testthat/test-dplyr-funcs-datetime.R | 137 +++ 4 files changed, 203 insertions(+)
[arrow] branch master updated (ad2fb74 -> 5073d63)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ad2fb74 ARROW-15960: [C++] Fix crash on adaptive int builder edge cases add 5073d63 MINOR: [R] Run the styler No new revisions were added by this update. Summary of changes: r/R/dplyr-funcs-datetime.R | 39 +-- 1 file changed, 21 insertions(+), 18 deletions(-)
[arrow] branch master updated (b0d6e27 -> 5bd9943)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from b0d6e27 ARROW-15544: [Go][Parquet] Fix origin schema base64 decoding add 5bd9943 ARROW-15489: [R] Expand RecordBatchReader usability No new revisions were added by this update. Summary of changes: r/R/record-batch-reader.R | 14 +- r/tests/testthat/test-record-batch-reader.R | 28 2 files changed, 41 insertions(+), 1 deletion(-)
[arrow] branch master updated (acda3c6 -> de02cfc)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from acda3c6 ARROW-14679: [R] [C++] Handle suffix argument in joins add de02cfc ARROW-15802 [R] bindings for `lubridate::make_datetime()` and `lubridate::make_date()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 46 r/tests/testthat/test-dplyr-funcs-datetime.R | 102 +++ 3 files changed, 149 insertions(+)
[arrow] branch master updated (70b8a82 -> acda3c6)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 70b8a82 ARROW-15919: [C++] Add function not commutative with timestamps & duration maths add acda3c6 ARROW-14679: [R] [C++] Handle suffix argument in joins No new revisions were added by this update. Summary of changes: r/R/arrowExports.R | 5 ++- r/R/dplyr-collect.R| 31 +-- r/R/dplyr-join.R | 4 +-- r/R/query-engine.R | 21 +++-- r/src/arrowExports.cpp | 12 +--- r/src/compute-exec.cpp | 9 -- r/tests/testthat/test-dplyr-join.R | 63 +- 7 files changed, 120 insertions(+), 25 deletions(-)
[arrow] branch master updated (a7f91ec -> ae93d12)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a7f91ec ARROW-15296: [CI][GO] Add Go staticcheck linting to CI lint job add ae93d12 ARROW-15627: [R] Fix union dataset unify schema No new revisions were added by this update. Summary of changes: r/R/dataset.R | 21 +++-- r/tests/testthat/test-dataset.R | 51 + 2 files changed, 65 insertions(+), 7 deletions(-)
[arrow] branch master updated (93ea682 -> 74200f5)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 93ea682 ARROW-15929: [R] io_thread_count is actually the CPU thread count add 74200f5 ARROW-15875: [R] Expose ReadMetadata for input streams No new revisions were added by this update. Summary of changes: r/R/arrowExports.R | 4 r/R/io.R | 3 +++ r/src/arrowExports.cpp | 26 +- r/src/io.cpp | 24 r/tests/testthat/test-io.R | 12 r/tests/testthat/test-s3.R | 7 +++ 6 files changed, 71 insertions(+), 5 deletions(-)
[arrow] branch master updated (d459311 -> 93ea682)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d459311 MINOR: [Docs] Update architectural_overview.rst add 93ea682 ARROW-15929: [R] io_thread_count is actually the CPU thread count No new revisions were added by this update. Summary of changes: r/R/config.R | 4 r/src/threadpool.cpp | 5 ++-- .../tests/testthat/test-config.R | 28 +++--- 3 files changed, 26 insertions(+), 11 deletions(-) copy ci/scripts/install_ceph.sh => r/tests/testthat/test-config.R (52%) mode change 100755 => 100644
[arrow] branch master updated: MINOR: [R] [CI] Disable the DuckDB dev tests that are failing
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new 8e13c2d MINOR: [R] [CI] Disable the DuckDB dev tests that are failing 8e13c2d is described below commit 8e13c2ddcf589fb03bc239098f8ade329c283b50 Author: Jonathan Keane AuthorDate: Fri Mar 18 10:38:43 2022 -0500 MINOR: [R] [CI] Disable the DuckDB dev tests that are failing This is being tracked at https://github.com/duckdb/duckdb/issues/3258 and we have a follow up to re-enable: https://issues.apache.org/jira/browse/ARROW-15970 Closes #12666 from jonkeane/disable-dev-duckdb Authored-by: Jonathan Keane Signed-off-by: Jonathan Keane --- dev/tasks/tasks.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml index 0992aa1..2fd5de6 100644 --- a/dev/tasks/tasks.yml +++ b/dev/tasks/tasks.yml @@ -148,6 +148,8 @@ groups: - example-* - wheel-* - python-sdist +# ARROW-15970 and duckdb/duckdb#3258 +- ~test-r-dev-duckdb tasks: # arbitrary_task_name:
[arrow] branch master updated: ARROW-14199 [R] bindings for format (where possible)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git The following commit(s) were added to refs/heads/master by this push: new dc2e0b2 ARROW-14199 [R] bindings for format (where possible) dc2e0b2 is described below commit dc2e0b2e44fdaa3d5ad0bb358ff8ce9db3bc7416 Author: Dragoș Moldovan-Grünfeld AuthorDate: Fri Mar 11 10:34:40 2022 -0600 ARROW-14199 [R] bindings for format (where possible) Closes #12319 from dragosmg/format_bindings Authored-by: Dragoș Moldovan-Grünfeld Signed-off-by: Jonathan Keane --- r/R/dplyr-funcs-datetime.R | 20 ++ r/R/dplyr-funcs-type.R | 18 ++ r/tests/testthat/test-dplyr-funcs-type.R | 102 +++ 3 files changed, 140 insertions(+) diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R index 8f5a768..ea25f62 100644 --- a/r/R/dplyr-funcs-datetime.R +++ b/r/R/dplyr-funcs-datetime.R @@ -189,3 +189,23 @@ register_bindings_datetime <- function() { build_expr("cast", x, options = list(to_type = date32())) }) } + +binding_format_datetime <- function(x, format = "", tz = "", usetz = FALSE) { + if (usetz) { +format <- paste(format, "%Z") + } + + if (call_binding("is.POSIXct", x)) { +# the casting part might not be required once +# https://issues.apache.org/jira/browse/ARROW-14442 is solved +# TODO revisit the steps below once the PR for that issue is merged +if (tz == "" && x$type()$timezone() != "") { + tz <- x$type()$timezone() +} else if (tz == "") { + tz <- Sys.timezone() +} +x <- build_expr("cast", x, options = cast_options(to_type = timestamp(x$type()$unit(), tz))) + } + + build_expr("strftime", x, options = list(format = format, locale = Sys.getlocale("LC_TIME"))) +} diff --git a/r/R/dplyr-funcs-type.R b/r/R/dplyr-funcs-type.R index 7fd3a7c..1bb633d 100644 --- a/r/R/dplyr-funcs-type.R +++ b/r/R/dplyr-funcs-type.R @@ -20,6 +20,7 @@ register_bindings_type <- function() { register_bindings_type_cast() register_bindings_type_inspect() register_bindings_type_elementwise() + register_bindings_type_format() } register_bindings_type_cast <- function() { @@ -292,3 +293,20 @@ register_bindings_type_elementwise <- function() { is_inf & !call_binding("is.na", is_inf) }) } + +register_bindings_type_format <- function() { + register_binding("format", function(x, ...) { +# We use R's format if we get a single R object here since we don't (yet) +# support all of the possible options for casting to string +if (!inherits(x, "Expression")) { + return(format(x, ...)) +} + +if (inherits(x, "Expression") && +x$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")]) { + binding_format_datetime(x, ...) +} else { + build_expr("cast", x, options = cast_options(to_type = string())) +} + }) +} diff --git a/r/tests/testthat/test-dplyr-funcs-type.R b/r/tests/testthat/test-dplyr-funcs-type.R index 9570ece..6c9d9ac 100644 --- a/r/tests/testthat/test-dplyr-funcs-type.R +++ b/r/tests/testthat/test-dplyr-funcs-type.R @@ -874,3 +874,105 @@ test_that("as.Date() converts successfully from date, timestamp, integer, char a test_df ) }) + +test_that("format date/time", { + skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168 + + times <- tibble( +datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Pacific/Marquesas"), NA), +date = c(as.Date("2021-01-01"), NA) + ) + formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %% %G %V %u" + formats_date <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %j %U %W %x %X %% %G %V %u" + + compare_dplyr_binding( +.input %>% + mutate(x = format(datetime, format = formats)) %>% + collect(), +times + ) + + compare_dplyr_binding( +.input %>% + mutate(x = format(date, format = formats_date)) %>% + collect(), +times + ) + + compare_dplyr_binding( +.input %>% + mutate(x = format(datetime, format = formats, tz = "Europe/Bucharest")) %>% + collect(), +times + ) + + compare_dplyr_binding( +.input %>% + mutate(x = format(datetime, format = formats, tz = "EST", usetz = TRUE)) %>% + collect(), +times + ) + + compare_dplyr_binding( +.input %>% + mutate(x = format(1), + y = format(13.7, nsmall = 3)) %>% + collect(), +times + ) + + compare_dplyr_binding( +.input
[arrow] branch master updated (a76794c -> 1b77e6d)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a76794c ARROW-15864: [Java][Docs] Update Arrow nightly Maven releases documentation add 1b77e6d ARROW-15701 [R] month() should allow integer inputs No new revisions were added by this update. Summary of changes: r/NEWS.md| 8 +-- r/R/dplyr-funcs-datetime.R | 27 - r/tests/testthat/test-dplyr-funcs-datetime.R | 90 3 files changed, 106 insertions(+), 19 deletions(-)
[arrow] branch master updated (7aecc83 -> 5772d65)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 7aecc83 ARROW-15847: [Python] Building with Parquet but without Parquet encryption fails add 5772d65 ARROW-15775 [R] Clean up as.* methods to use build_expr() No new revisions were added by this update. Summary of changes: r/R/dplyr-funcs-type.R | 12 +- r/tests/testthat/test-dplyr-funcs-type.R | 39 2 files changed, 41 insertions(+), 10 deletions(-)
[arrow] branch master updated (28b7725 -> f5a0caf)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 28b7725 ARROW-15844: [Release][Packaging] Use ASCII format for detached sign add f5a0caf ARROW-15743: [R] `skip` not connected up to `skip_rows` on open_dataset despite error messages indicating otherwise No new revisions were added by this update. Summary of changes: r/R/dataset-format.R| 69 + r/tests/testthat/test-dataset-csv.R | 15 2 files changed, 69 insertions(+), 15 deletions(-)
[arrow] branch master updated (ce46c1a -> 9719eae)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ce46c1a ARROW-15831: [Java] Upgrade Flight dependencies add 9719eae ARROW-14808 [R] Implement bindings for `lubridate::date()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 3 + r/R/dplyr-funcs-datetime.R | 3 + r/R/dplyr-funcs-type.R | 44 ++ r/tests/testthat/test-dplyr-funcs-datetime.R | 85 r/tests/testthat/test-dplyr-funcs-type.R | 75 5 files changed, 210 insertions(+)
[arrow] branch master updated (cf0b21c -> 6cf79d6)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from cf0b21c MINOR: [R][DOCS] fix typo add 6cf79d6 MINOR: [R] Fix errant trailing whitespace No new revisions were added by this update. Summary of changes: r/R/scalar.R| 2 +- r/man/Scalar.Rd | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
[arrow] branch master updated (632f4e9 -> 16d0c8a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 632f4e9 MINOR: [R] Fix cheatsheet url in the r folder readme add 16d0c8a MINOR: [R][DOCS] Replace GitHub issue numbers to JIRA issue numbers in the Changelog No new revisions were added by this update. Summary of changes: r/NEWS.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-)
[arrow] branch master updated (fffdca2 -> acfd1d2)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from fffdca2 ARROW-15258: [C++] Easy options to create a source node from a table add acfd1d2 ARROW-15697: [R] Add logo and meta tags to pkgdown site No new revisions were added by this update. Summary of changes: r/_pkgdown.yml | 8 1 file changed, 8 insertions(+)
[arrow] branch master updated (a916e60 -> effed6b)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from a916e60 MINOR: [R][DOCS] Fix link add effed6b ARROW-15673 [R] Error gracefully if DuckDB isn't installed No new revisions were added by this update. Summary of changes: dev/tasks/docker-tests/github.linux.yml | 9 + r/R/duckdb.R| 4 r/tests/testthat/_snaps/duckdb.md | 7 +++ r/tests/testthat/test-duckdb.R | 19 +-- 4 files changed, 37 insertions(+), 2 deletions(-) create mode 100644 r/tests/testthat/_snaps/duckdb.md
[arrow] branch master updated (6aa30703 -> 194ace5)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6aa30703 ARROW-15604: [C++][CI] Sporadic ThreadSanitizer failure with OpenTracing add 194ace5 ARROW-14826 [R] Implement bindings for `lubridate::dst()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/expression.R | 1 + r/tests/testthat/test-dplyr-funcs-datetime.R | 15 +++ 3 files changed, 17 insertions(+)
[arrow] branch master updated (5680d20 -> 16f36a5)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 5680d20 ARROW-15727: [Python] Allow converting lists of MonthDayNano intervals to Pandas add 16f36a5 ARROW-14815 [R] bindings for `lubridate::semester()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 1 + r/R/dplyr-funcs-datetime.R | 10 +++ r/tests/testthat/test-dplyr-funcs-datetime.R | 41 ++-- 3 files changed, 50 insertions(+), 2 deletions(-)
[arrow] branch master updated (5216c2b -> 0eaafe8)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 5216c2b ARROW-15348: [Doc][Guide] Lifecycle of a PR - minor corrections add 0eaafe8 ARROW-14817 [R] Implement bindings for `lubridate::tz()` No new revisions were added by this update. Summary of changes: r/NEWS.md| 3 +++ r/R/dplyr-funcs-datetime.R | 6 + r/R/type.R | 2 +- r/tests/testthat/test-dplyr-funcs-datetime.R | 38 r/tests/testthat/test-type.R | 9 --- 5 files changed, 54 insertions(+), 4 deletions(-)
[arrow] branch master updated (5bedee4 -> f9f2c08)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 5bedee4 MINOR: [Docs] Update contributing.rst add f9f2c08 ARROW-15708: [R] [CI] skip snappy encoded parquets on clang sanitizer No new revisions were added by this update. Summary of changes: ci/scripts/r_sanitize.sh | 3 +++ r/tests/testthat.R | 5 - r/tests/testthat/test-dataset.R| 5 - r/tests/testthat/test-dplyr-join.R | 5 + 4 files changed, 16 insertions(+), 2 deletions(-)
[arrow] branch master updated (ee9354d -> 3ce4f81)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from ee9354d ARROW-15690: [Dev] Update GitHub Actions workflows that hardcode master as default add 3ce4f81 ARROW-15468: [R] [CI] A crossbow job that tests against DuckDB's dev branch No new revisions were added by this update. Summary of changes: ci/docker/linux-apt-r.dockerfile | 3 +++ ci/scripts/r_deps.sh | 12 ++-- dev/tasks/r/azure.linux.yml | 6 ++ dev/tasks/tasks.yml | 27 +-- docker-compose.yml | 4 r/tests/testthat/test-duckdb.R | 20 ++-- 6 files changed, 50 insertions(+), 22 deletions(-)
[arrow] branch master updated (6a2ee11 -> cca3800)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6a2ee11 PARQUET-2124: [C++] Remove Parquet Dictionary DCHECK add cca3800 ARROW-15013: [R] Expose concatenate at the R level No new revisions were added by this update. Summary of changes: r/NAMESPACE | 2 ++ r/R/array.R | 43 +++ r/R/arrowExports.R| 4 +++ r/_pkgdown.yml| 1 + r/man/concat_arrays.Rd| 34 + r/src/array.cpp | 13 r/src/arrowExports.cpp| 16 ++ r/tests/testthat/test-Array.R | 69 +++ 8 files changed, 182 insertions(+) create mode 100644 r/man/concat_arrays.Rd
[arrow] branch master updated (699449f -> 5ad5ddc)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 699449f MINOR: [Docs][Archery] Correct the links in the README.md add 5ad5ddc ARROW-15606: [CI] [R] Add brew build that exercises the R package No new revisions were added by this update. Summary of changes: dev/tasks/homebrew-formulae/apache-arrow.rb| 4 +++ dev/tasks/homebrew-formulae/github.macos.yml | 42 +++--- dev/tasks/macros.jinja | 29 +++ ...ub.macos.autobrew.yml => github.macos.brew.yml} | 19 -- dev/tasks/tasks.yml| 9 + 5 files changed, 53 insertions(+), 50 deletions(-) copy dev/tasks/r/{github.macos.autobrew.yml => github.macos.brew.yml} (70%)
[arrow] branch master updated (7018a4b -> 0a56006)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 7018a4b ARROW-15595: [Release][Ruby] Add support for MFA add 0a56006 ARROW-15020: [R] Add bindings for new dataset writing options No new revisions were added by this update. Summary of changes: r/R/arrowExports.R| 4 +- r/R/dataset-write.R | 36 +++- r/man/write_dataset.Rd| 23 + r/src/arrowExports.cpp| 14 +-- r/src/dataset.cpp | 8 +- r/tests/testthat/test-dataset-write.R | 155 ++ 6 files changed, 228 insertions(+), 12 deletions(-)
[arrow] branch master updated (6fa5891 -> 43efadb)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 6fa5891 MINOR: [R] Update a URL to https add 43efadb MINOR: [R] run document(), document missing parameters No new revisions were added by this update. Summary of changes: r/NAMESPACE| 3 +++ r/R/type.R | 5 - r/man/FileFormat.Rd| 2 +- r/man/array.Rd | 1 + r/man/data-type.Rd | 9 + r/man/open_dataset.Rd | 2 +- r/src/arrowExports.cpp | 10 +- 7 files changed, 24 insertions(+), 8 deletions(-)
[arrow] branch master updated (11caf00 -> ee7897a)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 11caf00 ARROW-15570: [CI][Nightly] Drop centos-8 R nightly job add ee7897a ARROW-15605: [CI] [R] Keep using old macos runners on our autobrew CI job No new revisions were added by this update. Summary of changes: dev/tasks/r/github.macos.autobrew.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow] branch master updated (858470d -> 11caf00)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 858470d ARROW-14745: [R] Enable true duckdb streaming add 11caf00 ARROW-15570: [CI][Nightly] Drop centos-8 R nightly job No new revisions were added by this update. Summary of changes: dev/tasks/tasks.yml | 1 - 1 file changed, 1 deletion(-)
[arrow] branch master updated (501d92e -> 858470d)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 501d92e ARROW-15080: [Python][C++] Enable tuples conversion to interval add 858470d ARROW-14745: [R] Enable true duckdb streaming No new revisions were added by this update. Summary of changes: r/R/arrow-package.R | 2 +- r/R/dplyr-arrange.R | 2 +- r/R/dplyr-collect.R | 8 +++--- r/R/dplyr-count.R | 4 +-- r/R/dplyr-distinct.R| 2 +- r/R/dplyr-filter.R | 2 +- r/R/dplyr-group-by.R| 8 +++--- r/R/dplyr-join.R| 12 - r/R/dplyr-mutate.R | 4 +-- r/R/dplyr-select.R | 6 ++--- r/R/dplyr-summarize.R | 2 +- r/R/duckdb.R| 19 - r/man/to_arrow.Rd | 8 -- r/tests/testthat/test-dataset.R | 54 - r/tests/testthat/test-duckdb.R | 59 - 15 files changed, 144 insertions(+), 48 deletions(-)
[arrow] branch master updated (d403fd5 -> b39c5a0)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d403fd5 ARROW-15532: [C++] Fix unused warning for StringClassifyDoc add b39c5a0 ARROW-14169: [R] altrep for factors No new revisions were added by this update. Summary of changes: r/src/altrep.cpp | 371 ++--- r/src/array_to_vector.cpp | 41 ++--- r/src/arrow_types.h| 3 + r/tests/testthat/test-altrep.R | 22 +++ 4 files changed, 396 insertions(+), 41 deletions(-)
[arrow] branch master updated (d885d82 -> f48cabe)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d885d82 ARROW-15546: [FlightRPC][C++] Remove quotes from cookie header add f48cabe ARROW-15480: [R] Expand on schema/colnames mismatch error messages No new revisions were added by this update. Summary of changes: r/R/dataset-format.R| 30 -- r/tests/testthat/test-dataset-csv.R | 26 +- 2 files changed, 49 insertions(+), 7 deletions(-)
[arrow] branch master updated (fb5a4f6 -> c89e67d)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from fb5a4f6 ARROW-15520: [C++] Qualify `arrow_vendored::date::format()` for C++20 compatibility add c89e67d ARROW-15539: [Archery] Add ARROW_JEMALLOC to build options No new revisions were added by this update. Summary of changes: dev/archery/archery/lang/cpp.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
[arrow] branch master updated (d747326 -> d4e16a5)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from d747326 ARROW-14095: [C++] subtract(timestamp, duration) -> timestamp kernel add d4e16a5 MINOR: [R] Fix misalignment in arrow.Rmd vignette No new revisions were added by this update. Summary of changes: r/vignettes/arrow.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[arrow] branch master updated (c5b757f -> f92219d)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from c5b757f ARROW-14419 [R] Add filter + join test add f92219d ARROW-10456: [R] Implement MapType and MapArray No new revisions were added by this update. Summary of changes: r/R/array.R | 14 +++ r/R/arrowExports.R| 61 -- r/R/type.R| 18 +++ r/src/array.cpp | 28 + r/src/array_to_vector.cpp | 15 ++- r/src/arrowExports.cpp| 238 -- r/src/datatype.cpp| 59 ++ r/tests/testthat/test-Array.R | 23 r/tests/testthat/test-data-type.R | 38 ++ r/tests/testthat/test-parquet.R | 30 + r/tests/testthat/test-type.R | 8 ++ r/vignettes/arrow.Rmd | 2 +- 12 files changed, 482 insertions(+), 52 deletions(-)
[arrow] branch master updated (07ec0a1 -> c5b757f)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 07ec0a1 ARROW-14461 [R] write_dataset() allows users to pass invalid additional arguments add c5b757f ARROW-14419 [R] Add filter + join test No new revisions were added by this update. Summary of changes: r/tests/testthat/test-dplyr-join.R | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-)
[arrow] branch master updated (39367db -> 07ec0a1)
This is an automated email from the ASF dual-hosted git repository. jonkeane pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git. from 39367db ARROW-15126: [C++] Support Null type as group keys add 07ec0a1 ARROW-14461 [R] write_dataset() allows users to pass invalid additional arguments No new revisions were added by this update. Summary of changes: r/R/dataset-format.R | 42 ++- r/R/dataset-write.R | 4 +-- r/man/ChunkedArray.Rd| 1 + r/man/RecordBatch.Rd | 1 + r/man/Table.Rd | 1 + r/man/array.Rd | 1 + r/man/write_dataset.Rd | 4 +-- r/tests/testthat/_snaps/dataset-write.md | 49 r/tests/testthat/test-dataset-write.R| 34 ++ 9 files changed, 132 insertions(+), 5 deletions(-) create mode 100644 r/tests/testthat/_snaps/dataset-write.md