(arrow) branch main updated: GH-41841: [R][CI] Remove more defunct rhub containers (#41828)

2024-05-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 8f3bf67cca GH-41841: [R][CI] Remove more defunct rhub containers 
(#41828)
8f3bf67cca is described below

commit 8f3bf67cca32902e241b1857502247918861a3f8
Author: Jonathan Keane 
AuthorDate: Tue May 28 17:26:09 2024 -0500

GH-41841: [R][CI] Remove more defunct rhub containers (#41828)

Testing CI to see if we can replicate the incoming NOTEs:

```
Found the following (possibly) invalid file URIs:
  URI: articles/read_write.html
From: README.md
  URI: articles/data_wrangling.html
From: README.md
  URI: reference/acero.html
From: README.md
  URI: articles/install.html
From: README.md
  URI: articles/install_nightly.html
From: README.md
```

I wasn't able to replicate them in CI (even with 
`_R_CHECK_CRAN_INCOMING_REMOTE_` set to true, and installing pandoc so that the 
docs could be munged.)

But in the process realized we were running old rhub images that aren't 
updated anymore (thanks, @ thisisnic). Also did a bit of cleanup of 
`--run-donttest` which is now no longer needed (was removed in favor of the env 
var in 4.0)
* GitHub Issue: #41841

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 .github/workflows/r.yml |  5 ++--
 ci/scripts/r_install_system_dependencies.sh | 43 +++--
 ci/scripts/r_test.sh|  9 +++---
 dev/tasks/r/github.linux.cran.yml   |  9 +++---
 r/Makefile  |  4 +--
 5 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index aba7734765..6bd940f806 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -370,11 +370,12 @@ jobs:
 MAKEFLAGS = paste0("-j", parallel::detectCores()),
 ARROW_R_DEV = TRUE,
 "_R_CHECK_FORCE_SUGGESTS_" = FALSE,
-"_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_" = TRUE
+"_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_" = TRUE,
+"_R_CHECK_DONTTEST_EXAMPLES_" = TRUE
   )
   rcmdcheck::rcmdcheck(".",
 build_args = '--no-build-vignettes',
-args = c('--no-manual', '--as-cran', '--ignore-vignettes', 
'--run-donttest'),
+args = c('--no-manual', '--as-cran', '--ignore-vignettes'),
 error_on = 'warning',
 check_dir = 'check',
 timeout = 3600
diff --git a/ci/scripts/r_install_system_dependencies.sh 
b/ci/scripts/r_install_system_dependencies.sh
index be0d75ef23..7ddc2604f6 100755
--- a/ci/scripts/r_install_system_dependencies.sh
+++ b/ci/scripts/r_install_system_dependencies.sh
@@ -21,29 +21,30 @@ set -ex
 
 : ${ARROW_SOURCE_HOME:=/arrow}
 
-if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_GCS" == "ON" ] || [ "$ARROW_R_DEV" == 
"TRUE" ]; then
-  # Figure out what package manager we have
-  if [ "`which dnf`" ]; then
-PACKAGE_MANAGER=dnf
-  elif [ "`which yum`" ]; then
-PACKAGE_MANAGER=yum
-  elif [ "`which zypper`" ]; then
-PACKAGE_MANAGER=zypper
-  else
-PACKAGE_MANAGER=apt-get
-apt-get update
-  fi
+# Figure out what package manager we have
+if [ "`which dnf`" ]; then
+  PACKAGE_MANAGER=dnf
+elif [ "`which yum`" ]; then
+  PACKAGE_MANAGER=yum
+elif [ "`which zypper`" ]; then
+  PACKAGE_MANAGER=zypper
+else
+  PACKAGE_MANAGER=apt-get
+  apt-get update
+fi
 
-  # Install curl and OpenSSL for S3/GCS support
-  case "$PACKAGE_MANAGER" in
-apt-get)
-  apt-get install -y libcurl4-openssl-dev libssl-dev
-  ;;
-*)
-  $PACKAGE_MANAGER install -y libcurl-devel openssl-devel
-  ;;
-  esac
+# Install curl and OpenSSL (technically, only needed for S3/GCS support, but
+# installing the R curl package fails without it)
+case "$PACKAGE_MANAGER" in
+  apt-get)
+apt-get install -y libcurl4-openssl-dev libssl-dev
+;;
+  *)
+$PACKAGE_MANAGER install -y libcurl-devel openssl-devel
+;;
+esac
 
+if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_GCS" == "ON" ] || [ "$ARROW_R_DEV" == 
"TRUE" ]; then
   # The Dockerfile should have put this file here
   if [ "$ARROW_S3" == "ON" ] && [ -f 
"${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh" ] && [ "`which wget`" ]; then
 "${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh" latest /usr/local
diff --git a/ci/scripts/r_test.sh b/ci/scripts/r_test.sh
index e13da45e

(arrow) annotated tag r-universe-release updated (f39b3f343a -> 705303e3d9)

2024-05-25 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to annotated tag r-universe-release
in repository https://gitbox.apache.org/repos/asf/arrow.git


*** WARNING: tag r-universe-release was modified! ***

from f39b3f343a (commit)
  to 705303e3d9 (tag)
 tagging f39b3f343acc435333e6502b817e3be40ce54543 (commit)
 replaces apache-arrow-16.1.0
  by Jonathan Keane
  on Sat May 25 17:34:07 2024 -0500

- Log -
latest R package release on r-universe
---


No new revisions were added by this update.

Summary of changes:



(arrow) tag r-universe-release deleted (was ac9707663c)

2024-05-25 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to tag r-universe-release
in repository https://gitbox.apache.org/repos/asf/arrow.git


*** WARNING: tag r-universe-release was deleted! ***

 was ac9707663c Remove badges in README

The revisions that were on this tag are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(arrow) branch main updated: GH-41450: [R][CI] rhub/container follow ons (#41451)

2024-05-12 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 6d03215543 GH-41450: [R][CI] rhub/container follow ons (#41451)
6d03215543 is described below

commit 6d0321554374523ae0633d6bfe42cdeeb3b5d145
Author: Jonathan Keane 
AuthorDate: Sun May 12 11:00:26 2024 -0400

GH-41450: [R][CI] rhub/container follow ons (#41451)

More CI changes:

* GitHub Issue: #41450 (specifically use the rhub containers approach for 
clang sanitizer, remove some of our work arounds)
* Remove CentOS 7 CI support for R

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 .env   |  3 ---
 .github/workflows/r.yml|  3 +--
 ci/docker/linux-r.dockerfile   |  3 ---
 ci/scripts/java_jni_manylinux_build.sh |  3 ---
 ci/scripts/r_docker_configure.sh   | 20 
 ci/scripts/r_sanitize.sh   |  2 ++
 ci/scripts/r_test.sh   |  3 ---
 dev/tasks/r/azure.linux.yml|  1 -
 dev/tasks/r/github.packages.yml|  7 +++
 dev/tasks/tasks.yml| 13 ++---
 docker-compose.yml | 16 ++--
 r/tools/test-nixlibs.R |  4 
 r/tools/ubsan.supp |  1 +
 r/vignettes/install.Rmd| 33 -
 14 files changed, 15 insertions(+), 97 deletions(-)

diff --git a/.env b/.env
index ab2e4b4fbe..27474b2c73 100644
--- a/.env
+++ b/.env
@@ -86,9 +86,6 @@ ARROW_R_DEV=TRUE
 R_PRUNE_DEPS=FALSE
 TZ=UTC
 
-# Any non-empty string will install devtoolset-${DEVTOOLSET_VERSION}
-DEVTOOLSET_VERSION=
-
 # Used through docker-compose.yml and serves as the default version for the
 # ci/scripts/install_vcpkg.sh script. Prefer to use short SHAs to keep the
 # docker tags more readable.
diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 8228aaad7c..aba7734765 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -192,12 +192,11 @@ jobs:
   fail-fast: false
   matrix:
 config:
-  - { org: "rhub", image: "ubuntu-gcc12", tag: "latest", devtoolset: 
"" }
+  - { org: "rhub", image: "ubuntu-gcc12", tag: "latest" }
 env:
   R_ORG: ${{ matrix.config.org }}
   R_IMAGE: ${{ matrix.config.image }}
   R_TAG: ${{ matrix.config.tag }}
-  DEVTOOLSET_VERSION: ${{ matrix.config.devtoolset }}
 steps:
   - name: Checkout Arrow
 uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # 
v4.0.0
diff --git a/ci/docker/linux-r.dockerfile b/ci/docker/linux-r.dockerfile
index d368a6629c..7b7e989adc 100644
--- a/ci/docker/linux-r.dockerfile
+++ b/ci/docker/linux-r.dockerfile
@@ -27,9 +27,6 @@ ENV R_BIN=${r_bin}
 ARG r_dev=FALSE
 ENV ARROW_R_DEV=${r_dev}
 
-ARG devtoolset_version=
-ENV DEVTOOLSET_VERSION=${devtoolset_version}
-
 ARG r_prune_deps=FALSE
 ENV R_PRUNE_DEPS=${r_prune_deps}
 
diff --git a/ci/scripts/java_jni_manylinux_build.sh 
b/ci/scripts/java_jni_manylinux_build.sh
index da4987d307..4921ce170b 100755
--- a/ci/scripts/java_jni_manylinux_build.sh
+++ b/ci/scripts/java_jni_manylinux_build.sh
@@ -35,9 +35,6 @@ echo "=== Clear output directories and leftovers ==="
 rm -rf ${build_dir}
 
 echo "=== Building Arrow C++ libraries ==="
-devtoolset_version=$(rpm -qa "devtoolset-*-gcc" --queryformat %{VERSION} | \
-   grep -o "^[0-9]*")
-devtoolset_include_cpp="/opt/rh/devtoolset-${devtoolset_version}/root/usr/include/c++/${devtoolset_version}"
 : ${ARROW_ACERO:=ON}
 export ARROW_ACERO
 : ${ARROW_BUILD_TESTS:=ON}
diff --git a/ci/scripts/r_docker_configure.sh b/ci/scripts/r_docker_configure.sh
index 52db2e6df6..8a962fe576 100755
--- a/ci/scripts/r_docker_configure.sh
+++ b/ci/scripts/r_docker_configure.sh
@@ -67,26 +67,6 @@ sloppiness = include_file_ctime
 hash_dir = false" >> ~/.ccache/ccache.conf
 fi
 
-# Special hacking to try to reproduce quirks on centos using non-default build
-# tooling.
-if [[ -n "$DEVTOOLSET_VERSION" ]]; then
-  $PACKAGE_MANAGER install -y centos-release-scl
-  $PACKAGE_MANAGER install -y "devtoolset-$DEVTOOLSET_VERSION"
-
-  # Enable devtoolset here so that `which gcc` finds the right compiler below
-  source /opt/rh/devtoolset-${DEVTOOLSET_VERSION}/enable
-
-  # Build images which require the devtoolset don't have CXX17 variables
-  # set as the system compiler doesn't support C++17
-  if [ ! "`{R_BIN} CMD config CXX17`" ]; then
-mkdir -p ~/.R
-echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars
-echo "CXX17 = $(which g++) -fPIC" >> ~/.R/Makevars

(arrow) branch main updated: GH-41402: [CI][R] Update our backwards compatibility CI any other R 4.4 cleanups (#41403)

2024-04-29 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 6eb0b37386 GH-41402: [CI][R] Update our backwards compatibility CI any 
other R 4.4 cleanups (#41403)
6eb0b37386 is described below

commit 6eb0b37386ecbfc4108e914d6dadb8b049a6f549
Author: Jonathan Keane 
AuthorDate: Mon Apr 29 08:39:07 2024 -0500

GH-41402: [CI][R] Update our backwards compatibility CI any other R 4.4 
cleanups (#41403)

### Rationale for this change

Keep up with the state of the world, ensure we are maintaining backwards 
compatibility.

Resolves #41402

### What changes are included in this PR?

* Bump to 4.4 as the release
* Remove old 3.6 jobs now that we no longer support that; clean up code 
where we hardcode things fro 3.6 and below
* Move many of our CI jobs to [rhub's new 
containers](https://github.com/r-hub/containers). We were accidentally running 
stale R devel (from December 2023) because the other rhub images stopped being 
updated. (One exception to be done as a follow on: #41416)
* Resolve a number of extended test failures

With this PR R extended tests should be all green with the exceptions of:

* Two sanitizer jobs (test-fedora-r-clang-sanitizer, 
test-ubuntu-r-sanitizer) — which are being investigated / fixed in #41421
* Valgrind — I'm running one last run with a new suppression file.
* Binary jobs — these work but fail at upload, see 
https://github.com/apache/arrow/pull/41403#discussion_r1582245207
* Windows R Release — failing on main, #41398

### Are these changes tested?

By definition.

### Are there any user-facing changes?

No.

* GitHub Issue: #41402

Lead-authored-by: Jonathan Keane 
Co-authored-by: Jacob Wujciak-Jens 
Signed-off-by: Jonathan Keane 
---
 .env   |  6 +--
 .github/workflows/r.yml|  4 +-
 ci/docker/linux-apt-docs.dockerfile|  2 +-
 ci/docker/linux-apt-lint.dockerfile|  2 +-
 ci/docker/linux-apt-r.dockerfile   |  2 +-
 ci/etc/valgrind-cran.supp  | 20 +++-
 ci/scripts/r_sanitize.sh   |  4 +-
 ci/scripts/r_test.sh   |  7 ++-
 ci/scripts/r_valgrind.sh   |  2 +-
 .../r/github.linux.arrow.version.back.compat.yml   |  2 +
 dev/tasks/r/github.linux.offline.build.yml |  2 +-
 dev/tasks/r/github.linux.versions.yml  |  2 +-
 dev/tasks/r/github.packages.yml| 10 ++--
 dev/tasks/tasks.yml| 12 ++---
 docker-compose.yml |  5 +-
 r/DESCRIPTION  |  2 +-
 r/R/dplyr-funcs-type.R |  2 +-
 r/R/util.R | 14 --
 r/tests/testthat/test-Array.R  |  5 --
 r/tests/testthat/test-RecordBatch.R| 16 ++-
 r/tests/testthat/test-Table.R  |  4 --
 r/tests/testthat/test-altrep.R |  7 ++-
 r/tests/testthat/test-chunked-array.R  |  5 --
 r/tests/testthat/test-dplyr-collapse.R | 10 
 r/tests/testthat/test-dplyr-funcs-datetime.R   | 32 +++--
 r/tests/testthat/test-dplyr-funcs-type.R   |  3 +-
 r/tests/testthat/test-dplyr-glimpse.R  |  5 --
 r/tests/testthat/test-scalar.R |  4 --
 r/tools/test-nixlibs.R |  7 ++-
 r/vignettes/developers/docker.Rmd  | 50 ++--
 r/vignettes/install.Rmd| 55 ++
 31 files changed, 139 insertions(+), 164 deletions(-)

diff --git a/.env b/.env
index d9f875a4d4..ab2e4b4fbe 100644
--- a/.env
+++ b/.env
@@ -71,12 +71,12 @@ NUMBA=latest
 NUMPY=latest
 PANDAS=latest
 PYTHON=3.8
-R=4.2
+R=4.4
 SPARK=master
 TURBODBC=latest
 
-# These correspond to images on Docker Hub that contain R, e.g. 
rhub/ubuntu-gcc-release:latest
-R_IMAGE=ubuntu-gcc-release
+# These correspond to images on Docker Hub that contain R, e.g. 
rhub/ubuntu-release:latest
+R_IMAGE=ubuntu-release
 R_ORG=rhub
 R_TAG=latest
 
diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 05c85fa6dc..8228aaad7c 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -121,7 +121,7 @@ jobs:
 strategy:
   fail-fast: false
   matrix:
-r: ["4.3"]
+r: ["4.4"]
 ubuntu: [20.04]
 force-tests: ["true"]
 env:
@@ -192,7 +192,7 @@ jobs:
   fail-fast: false
   matrix:
 config:
-  - { org: "rhub", image: "

(arrow) branch main updated (be3baf2697 -> 3886cf1d43)

2024-04-05 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from be3baf2697 GH-40680: [Java] Test JDK 22 in CI (#41038)
 add 3886cf1d43 GH-40991: [R] Prefer r-universe, add a startup message 
(#41019)

No new revisions were added by this update.

Summary of changes:
 r/DESCRIPTION  |  2 +-
 r/R/arrow-info.R   |  3 ++-
 r/R/arrow-package.R| 71 +++---
 r/R/install-arrow.R| 14 +++---
 r/man/arrow-package.Rd |  4 +--
 r/man/format_schema.Rd | 18 +
 6 files changed, 70 insertions(+), 42 deletions(-)
 create mode 100644 r/man/format_schema.Rd



(arrow) branch main updated (81c9d30be3 -> 640667664a)

2024-03-02 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 81c9d30be3 GH-40155: [Go][FlightRPC][FlightSQL] Implement Session 
Management (#40284)
 add 640667664a GH-40323: [R] [CI] Use rocker/r-ver instead of 
library/r-base (#40321)

No new revisions were added by this update.

Summary of changes:
 dev/tasks/tasks.yml | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)



(arrow) branch main updated: GH-40268: [Archery] Bump the version of pygit2, adapt to API changes (#40269)

2024-03-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 30e6d72242 GH-40268: [Archery] Bump the version of pygit2, adapt to 
API changes (#40269)
30e6d72242 is described below

commit 30e6d72242e376baa598b2e8f1d9b80d800a974c
Author: Jonathan Keane 
AuthorDate: Fri Mar 1 07:40:09 2024 -0600

GH-40268: [Archery] Bump the version of pygit2, adapt to API changes 
(#40269)

### Rationale for this change

`archery crossbow submit ...` fails with newer versions of pygit2

### What changes are included in this PR?

Adapt away from deprecated [sic] APIs in pygit2 to ones that work with 
current versions, bump the pin

### Are these changes tested?

Manually, yes, I can use `archery crossbow submit ...` again. CI will run 
using archery in a bunch of places on this PR too.

### Are there any user-facing changes?

No
* GitHub Issue: #40268

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 .github/workflows/archery.yml| 2 +-
 .github/workflows/comment_bot.yml| 2 +-
 .github/workflows/dev.yml| 4 ++--
 .github/workflows/docs.yml   | 2 +-
 .github/workflows/docs_light.yml | 2 +-
 .github/workflows/java_nightly.yml   | 2 +-
 .github/workflows/pr_bot.yml | 2 +-
 .github/workflows/r_nightly.yml  | 6 +++---
 dev/archery/archery/crossbow/core.py | 2 +-
 dev/archery/archery/docker/cli.py| 2 +-
 dev/archery/setup.py | 6 +-
 dev/tasks/java-jars/github.yml   | 2 +-
 dev/tasks/macros.jinja   | 4 ++--
 13 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/.github/workflows/archery.yml b/.github/workflows/archery.yml
index d5f419f8a7..dbd24796db 100644
--- a/.github/workflows/archery.yml
+++ b/.github/workflows/archery.yml
@@ -59,7 +59,7 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@v5
 with:
-  python-version: '3.8'
+  python-version: '3.12'
   - name: Install pygit2 binary wheel
 run: pip install pygit2 --only-binary pygit2
   - name: Install Archery, Crossbow- and Test Dependencies
diff --git a/.github/workflows/comment_bot.yml 
b/.github/workflows/comment_bot.yml
index dbcbbff549..038a468a81 100644
--- a/.github/workflows/comment_bot.yml
+++ b/.github/workflows/comment_bot.yml
@@ -43,7 +43,7 @@ jobs:
   - name: Set up Python
 uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # 
v5.0.0
 with:
-  python-version: 3.8
+  python-version: 3.12
   - name: Install Archery and Crossbow dependencies
 run: pip install -e arrow/dev/archery[bot]
   - name: Handle GitHub comment event
diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
index 4892767324..77efda58cb 100644
--- a/.github/workflows/dev.yml
+++ b/.github/workflows/dev.yml
@@ -43,7 +43,7 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # 
v5.0.0
 with:
-  python-version: 3.8
+  python-version: 3.12
   - name: Setup Archery
 run: pip install -e dev/archery[docker]
   - name: Execute Docker Build
@@ -90,7 +90,7 @@ jobs:
   - name: Install Python
 uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # 
v5.0.0
 with:
-  python-version: '3.8'
+  python-version: '3.12'
   - name: Install Ruby
 uses: ruby/setup-ruby@250fcd6a742febb1123a77a841497ccaa8b9e939 # 
v1.152.0
 with:
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index e394347e95..82b43ee236 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -53,7 +53,7 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # 
v5.0.0
 with:
-  python-version: 3.8
+  python-version: 3.12
   - name: Setup Archery
 run: pip install -e dev/archery[docker]
   - name: Execute Docker Build
diff --git a/.github/workflows/docs_light.yml b/.github/workflows/docs_light.yml
index 5303531f34..306fc51350 100644
--- a/.github/workflows/docs_light.yml
+++ b/.github/workflows/docs_light.yml
@@ -59,7 +59,7 @@ jobs:
   - name: Setup Python
 uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # 
v5.0.0
 with:
-  python-version: 3.8
+  python-version: 3.12
   - name: Setup Archery
 run: pip install -e dev/archery[docker]
   - name: Execute Docker Build
diff --git a/.github/workflows/java_nightly.yml 
b/.github/workflows/java_nightly.yml
index c19576d2f6..c535dc4a07 100644
--- a/.github/workflows/java_nightly.yml
+++ b/.github/workflows

(arrow) branch main updated (c6f20a2348 -> 2fbf22a736)

2024-02-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from c6f20a2348 GH-40276: [C++] Fix an simple buffer-overflow case in 
decimal_benchmark (#40277)
 add 2fbf22a736 GH-40248: [R] fallback to the correct libtool when we find 
a GNU one (#40259)

No new revisions were added by this update.

Summary of changes:
 cpp/cmake_modules/BuildUtils.cmake   | 22 +-
 dev/tasks/r/github.macos-linux.local.yml | 12 
 2 files changed, 33 insertions(+), 1 deletion(-)



(arrow) branch main updated (4ceb661013 -> b684028dfb)

2024-02-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 4ceb661013 GH-39880: [Python][CI] Pin moto<5 for dask integration 
tests (#39881)
 add b684028dfb GH-39859: [R] Remove macOS from the allow list (#39861)

No new revisions were added by this update.

Summary of changes:
 r/tools/nixlibs-allowlist.txt | 1 -
 r/tools/nixlibs.R | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)



[arrow] branch main updated: GH-38216: [R] open_dataset(format = "json") not documented (#38258)

2023-10-17 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new ac581fd2a8 GH-38216: [R] open_dataset(format = "json") not documented 
(#38258)
ac581fd2a8 is described below

commit ac581fd2a87b35c872cf334bb147851fe1287714
Author: Divyansh200102 <146909065+divyansh200...@users.noreply.github.com>
AuthorDate: Tue Oct 17 23:47:32 2023 +0530

GH-38216: [R] open_dataset(format = "json") not documented (#38258)

fixes #38216
* Closes: #38216

Lead-authored-by: Divyansh200102 
Co-authored-by: Divyansh200102 
<146909065+divyansh200...@users.noreply.github.com>
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/R/dataset.R | 3 ++-
 r/man/open_dataset.Rd | 7 ---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/r/R/dataset.R b/r/R/dataset.R
index 90e6516927..2400d08393 100644
--- a/r/R/dataset.R
+++ b/r/R/dataset.R
@@ -112,7 +112,8 @@
 #' * "csv"/"text", aliases for the same thing (because comma is the default
 #'   delimiter for text files
 #' * "tsv", equivalent to passing `format = "text", delimiter = "\t"`
-#'
+#' * "json", for JSON format datasets Note: only newline-delimited JSON (aka 
ND-JSON) datasets
+#'   are currently supported
 #' Default is "parquet", unless a `delimiter` is also specified, in which case
 #' it is assumed to be "text".
 #' @param ... additional arguments passed to `dataset_factory()` when `sources`
diff --git a/r/man/open_dataset.Rd b/r/man/open_dataset.Rd
index 94b537a1d3..7c3d32289f 100644
--- a/r/man/open_dataset.Rd
+++ b/r/man/open_dataset.Rd
@@ -74,10 +74,11 @@ only version 2 files are supported
 \item "csv"/"text", aliases for the same thing (because comma is the default
 delimiter for text files
 \item "tsv", equivalent to passing \verb{format = "text", delimiter = "\\t"}
-}
-
+\item "json", for JSON format datasets Note: only newline-delimited JSON (aka 
ND-JSON) datasets
+are currently supported
 Default is "parquet", unless a \code{delimiter} is also specified, in which 
case
-it is assumed to be "text".}
+it is assumed to be "text".
+}}
 
 \item{factory_options}{list of optional FileSystemFactoryOptions:
 \itemize{



[arrow] branch main updated: MINOR: [R] Avoid stray output from expr when checking for 10.13 (#38303)

2023-10-17 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 40571db03c MINOR: [R] Avoid stray output from expr when checking for 
10.13 (#38303)
40571db03c is described below

commit 40571db03cc7f819f33a05dd421ef86816fe0502
Author: Jacob Wujciak-Jens 
AuthorDate: Tue Oct 17 17:22:18 2023 +0200

MINOR: [R] Avoid stray output from expr when checking for 10.13 (#38303)

### Rationale for this change

`expr` was printing the number of matching chars which showed up as noise 
in the log (which we want to avoid as much as possible to avoid any false 
positive checks)
See https://github.com/apache/arrow/pull/38236#issuecomment-1761679457 for 
@ jonkeane's investigation.

### What changes are included in this PR?

Replace use of expr with test.

### Are these changes tested?
Crossbow

Lead-authored-by: Jacob Wujciak-Jens 
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/configure | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/r/configure b/r/configure
index addf7b59c7..c957c9946f 100755
--- a/r/configure
+++ b/r/configure
@@ -264,7 +264,10 @@ set_pkg_vars () {
 PKG_CFLAGS="$PKG_CFLAGS $ARROW_R_CXXFLAGS"
   fi
 
-  if [ "$UNAME" = "Darwin" ] && expr $(sw_vers -productVersion) : '10\.13'; 
then
+  # We use expr because the product version returns more than just 10.13 and 
we want to 
+  # match the substring. However, expr always outputs the number of matched 
characters
+  # to stdout, to avoid noise in the log we redirect the output to /dev/null
+  if [ "$UNAME" = "Darwin" ] && expr $(sw_vers -productVersion) : '10\.13' 
>/dev/null 2>&1; then
 # avoid C++17 availability warnings on macOS < 11
 PKG_CFLAGS="$PKG_CFLAGS -D_LIBCPP_DISABLE_AVAILABILITY"
   fi



[arrow] branch main updated: GH-33807: [R] Add a message if we detect running under emulation (#37777)

2023-09-19 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 64ad8e564e GH-33807: [R] Add a message if we detect running under 
emulation (#3)
64ad8e564e is described below

commit 64ad8e564ea013101b8565ce200e54e5c85bac8d
Author: Jonathan Keane 
AuthorDate: Tue Sep 19 11:15:27 2023 -0500

GH-33807: [R] Add a message if we detect running under emulation (#3)

Resolves #33807 and #37034

### Rationale for this change

If someone is running R under emulation, arrow segfaults without error. We 
can detect this when we load so can also warn people that this is not 
recommended. Though the version of R being run is not directly an arrow issue, 
arrow fails very quickly in this configuration.

### What changes are included in this PR?

Detect when running under rosetta (on macOS only) and warn when the library 
is attached

### Are these changes tested?

No, given the paucity of ARM-based mac CI, testing this organically would 
be difficult. But the logic is straightforward.

### Are there any user-facing changes?

Yes, a warning when someone loads arrow under emulation.
* Closes: #33807

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/R/arrow-package.R | 21 +
 r/R/install-arrow.R |  4 +---
 r/README.md |  2 ++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 8f44f8936b..09183250ba 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -183,6 +183,22 @@ configure_tzdb <- function() {
   # Just to be extra safe, let's wrap this in a try();
   # we don't want a failed startup message to prevent the package from loading
   try({
+# On MacOS only, Check if we are running in under emulation, and warn 
this will not work
+if (on_rosetta()) {
+  packageStartupMessage(
+paste(
+  "Warning:",
+  "  It appears that you are running R and Arrow in emulation (i.e. 
you're",
+  "  running an Intel version of R on a non-Intel mac). This 
configuration is",
+  "  not supported by arrow, you should install a native (arm64) build 
of R",
+  "  and use arrow with that. See 
https://cran.r-project.org/bin/macosx/;,
+  "",
+  sep = "\n"
+)
+  )
+}
+
+
 features <- arrow_info()$capabilities
 # That has all of the #ifdef features, plus the compression libs and the
 # string libraries (but not the memory allocators, they're added elsewhere)
@@ -225,6 +241,11 @@ on_macos_10_13_or_lower <- function() {
 package_version(unname(Sys.info()["release"])) < "18.0.0"
 }
 
+on_rosetta <- function() {
+  identical(tolower(Sys.info()[["sysname"]]), "darwin") &&
+identical(system("sysctl -n sysctl.proc_translated", intern = TRUE), "1")
+}
+
 option_use_threads <- function() {
   !is_false(getOption("arrow.use_threads"))
 }
diff --git a/r/R/install-arrow.R b/r/R/install-arrow.R
index 8380fa2af9..7017d4f39b 100644
--- a/r/R/install-arrow.R
+++ b/r/R/install-arrow.R
@@ -61,7 +61,6 @@ install_arrow <- function(nightly = FALSE,
   verbose = Sys.getenv("ARROW_R_DEV", FALSE),
   repos = getOption("repos"),
   ...) {
-  sysname <- tolower(Sys.info()[["sysname"]])
   conda <- isTRUE(grepl("conda", R.Version()$platform))
 
   if (conda) {
@@ -80,8 +79,7 @@ install_arrow <- function(nightly = FALSE,
 # On the M1, we can't use the usual autobrew, which pulls Intel 
dependencies
 apple_m1 <- grepl("arm-apple|aarch64.*darwin", R.Version()$platform)
 # On Rosetta, we have to build without JEMALLOC, so we also can't autobrew
-rosetta <- identical(sysname, "darwin") && identical(system("sysctl -n 
sysctl.proc_translated", intern = TRUE), "1")
-if (rosetta) {
+if (on_rosetta()) {
   Sys.setenv(ARROW_JEMALLOC = "OFF")
 }
 if (apple_m1 || rosetta) {
diff --git a/r/README.md b/r/README.md
index d343d6979c..3c1e3570ff 100644
--- a/r/README.md
+++ b/r/README.md
@@ -73,6 +73,8 @@ additional steps should be required.
 
 There are some special cases to note:
 
+- On macOS, the R you use with Arrow should match the architecture of the 
machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use 
R compiled for arm64. If you're using an Intel based mac, use R compiled for 
x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will 
result in segfaults and crashes. 
+
 - On Linux the installation process can sometimes be more involved because 
 CRAN does not host binaries for Linux. For more information please see the 
[installation guide](https://arrow.apache.org/docs/r/articles/install.html).
 



[arrow] branch master updated: GH-15205: [R] Fix a parquet-fixture finding in R tests (#15207)

2023-01-06 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bd847b2ae GH-15205: [R] Fix a parquet-fixture finding in R tests 
(#15207)
6bd847b2ae is described below

commit 6bd847b2aefdb0f10eaf83a3bfe2dc8ee269e8e4
Author: Jonathan Keane 
AuthorDate: Fri Jan 6 08:25:20 2023 -0600

GH-15205: [R] Fix a parquet-fixture finding in R tests (#15207)

A follow on to #15197 where we actually force these tests when the 
force-tests job is run + make sure that we look at the root of the filesystem 
for the fixtures
* Closes: #15205

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 .github/workflows/r.yml |  2 ++
 docker-compose.yml  |  1 +
 r/tests/testthat/test-parquet.R | 23 +--
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 9173f0e530..e7b1ee06e9 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -69,6 +69,7 @@ jobs:
 uses: actions/checkout@v3
 with:
   fetch-depth: 0
+  submodules: recursive
   - name: Cache Docker Volumes
 uses: actions/cache@v3
 with:
@@ -137,6 +138,7 @@ jobs:
 uses: actions/checkout@v3
 with:
   fetch-depth: 0
+  submodules: recursive
   - name: Setup Python
 uses: actions/setup-python@v4
 with:
diff --git a/docker-compose.yml b/docker-compose.yml
index 23583d6b65..df497a2de1 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1242,6 +1242,7 @@ services:
   LIBARROW_BUILD: 'false'
   NOT_CRAN: 'true'
   ARROW_R_DEV: ${ARROW_R_DEV}
+  ARROW_SOURCE_HOME: '/arrow'
 volumes: *ubuntu-volumes
 command: >
   /bin/bash -c "
diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R
index be71d813bd..e1e54a5139 100644
--- a/r/tests/testthat/test-parquet.R
+++ b/r/tests/testthat/test-parquet.R
@@ -458,22 +458,17 @@ test_that("Can read parquet with nested lists and maps", {
   # * ../cpp/submodules/parquet-testing/data
   # ARROW_SOURCE_HOME is set in many of our CI setups, so that will find the 
files
   # the .. version should catch some (thought not all) ways of running tests 
locally
-  parquet_test_data <- file.path(
-Sys.getenv("ARROW_SOURCE_HOME", test_path("..")),
-"cpp",
-"submodules",
-"parquet-testing",
-"data"
-  )
-  skip_if_not(dir.exists(parquet_test_data), "Parquet test data missing")
+  base_path <- Sys.getenv("ARROW_SOURCE_HOME", "..")
+  # make this a full path, at the root of the filesystem if we're using 
ARROW_SOURCE_HOME
+  if (base_path != "..") {
+base_path <- file.path("", base_path)
+  }
+  parquet_test_data <- file.path(base_path, "cpp", "submodules", 
"parquet-testing", "data")
+  skip_if_not(dir.exists(parquet_test_data) | force_tests(), "Parquet test 
data missing")
 
   pq <- read_parquet(paste0(parquet_test_data, 
"/nested_lists.snappy.parquet"), as_data_frame = FALSE)
-  expect_equal(pq$a$type, list_of(list_of(list_of(utf8(, ignore_attr = 
TRUE)
+  expect_type_equal(pq$a, list_of(field("element", list_of(field("element", 
list_of(field("element", utf8(
 
   pq <- read_parquet(paste0(parquet_test_data, "/nested_maps.snappy.parquet"), 
as_data_frame = FALSE)
-  expect_equal(
-pq$a$type,
-map_of(utf8(), map_of(int32(), field("val", boolean(), nullable = FALSE))),
-ignore_attr = TRUE
-  )
+  expect_true(pq$a$type == map_of(utf8(), map_of(int32(), field("value", 
boolean(), nullable = FALSE
 })



[arrow] branch master updated: GH-15001: [R] Fix Parquet datatype test failure (#15197)

2023-01-05 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d4a0c9e8be GH-15001: [R] Fix Parquet datatype test failure (#15197)
d4a0c9e8be is described below

commit d4a0c9e8be8f2730dd80be9934e27aa6bd4a0850
Author: Will Jones 
AuthorDate: Thu Jan 5 09:02:19 2023 -0800

GH-15001: [R] Fix Parquet datatype test failure (#15197)


* Closes: #15001

Lead-authored-by: Will Jones 
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/tests/testthat/test-parquet.R | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R
index 591805d4ff..be71d813bd 100644
--- a/r/tests/testthat/test-parquet.R
+++ b/r/tests/testthat/test-parquet.R
@@ -453,12 +453,27 @@ test_that("deprecated int96 timestamp unit can be 
specified when reading Parquet
 })
 
 test_that("Can read parquet with nested lists and maps", {
-  parquet_test_data <- 
test_path("../../../cpp/submodules/parquet-testing/data")
+  # Construct the path to the parquet-testing submodule. This will search:
+  # * $ARROW_SOURCE_HOME/cpp/submodules/parquet-testing/data
+  # * ../cpp/submodules/parquet-testing/data
+  # ARROW_SOURCE_HOME is set in many of our CI setups, so that will find the 
files
+  # the .. version should catch some (thought not all) ways of running tests 
locally
+  parquet_test_data <- file.path(
+Sys.getenv("ARROW_SOURCE_HOME", test_path("..")),
+"cpp",
+"submodules",
+"parquet-testing",
+"data"
+  )
   skip_if_not(dir.exists(parquet_test_data), "Parquet test data missing")
 
   pq <- read_parquet(paste0(parquet_test_data, 
"/nested_lists.snappy.parquet"), as_data_frame = FALSE)
-  expect_equal(pq$a$type, list_of(list_of(list_of(utf8()
+  expect_equal(pq$a$type, list_of(list_of(list_of(utf8(, ignore_attr = 
TRUE)
 
   pq <- read_parquet(paste0(parquet_test_data, "/nested_maps.snappy.parquet"), 
as_data_frame = FALSE)
-  expect_equal(pq$a$type, map_of(utf8(), map_of(int32(), field("val", 
boolean(), nullable = FALSE
+  expect_equal(
+pq$a$type,
+map_of(utf8(), map_of(int32(), field("val", boolean(), nullable = FALSE))),
+ignore_attr = TRUE
+  )
 })



[arrow] branch master updated: GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners (#15116)

2023-01-03 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dd5cedb21 GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 
on GHA runners (#15116)
4dd5cedb21 is described below

commit 4dd5cedb21d7b58d837bdb3c0d35a5cd80fd9f4b
Author: Jacob Wujciak-Jens 
AuthorDate: Tue Jan 3 19:32:58 2023 +0100

GH-15114: [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners 
(#15116)


* Closes: #15114

Authored-by: Jacob Wujciak-Jens 
Signed-off-by: Jonathan Keane 
---
 dev/tasks/macros.jinja| 5 +
 dev/tasks/r/github.macos.brew.yml | 7 +--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/dev/tasks/macros.jinja b/dev/tasks/macros.jinja
index 72f575a188..9cb0c0f8a8 100644
--- a/dev/tasks/macros.jinja
+++ b/dev/tasks/macros.jinja
@@ -235,6 +235,10 @@ on:
   brew unlink python@2 || true
   brew config
   brew doctor || true
+  # The GHA runners install of python > 3.10 is incompatible with brew so 
we
+  # have to force overwritting of the symlinks
+  # see https://github.com/actions/runner-images/issues/6868
+  brew install --overwrite python@3.11 python@3.10
 
   ARROW_GLIB_FORMULA=$(echo ${ARROW_FORMULA} | sed -e 's/\.rb/-glib.rb/')
   echo "ARROW_GLIB_FORMULA=${ARROW_GLIB_FORMULA}" >> ${GITHUB_ENV}
@@ -396,3 +400,4 @@ on:
   {{ key }}: "{{ value }}"
   {% endfor %}
 {% endmacro %}
+
diff --git a/dev/tasks/r/github.macos.brew.yml 
b/dev/tasks/r/github.macos.brew.yml
index 5f426ab42c..7cf86d999d 100644
--- a/dev/tasks/r/github.macos.brew.yml
+++ b/dev/tasks/r/github.macos.brew.yml
@@ -31,14 +31,17 @@ jobs:
 env:
 {{ macros.github_set_sccache_envvars()|indent(8)}}  
 run: |
+  
   brew install sccache
+  # for testing
+  brew install minio
+  
   # TODO: Update the TODO for ARROW-16907 below to refer to main 
instead of master
   #   after migrating the default branch to main.
   # TODO(ARROW-16907): apache/arrow@master seems to be installed 
already
   # so this does nothing on a branch/PR
   brew install -v --HEAD apache-arrow
-  # for testing
-  brew install minio
+
   - uses: r-lib/actions/setup-r@v2
   - name: Install dependencies
 run: |



[arrow] branch master updated (353ab45cd4 -> 13ede7bb17)

2022-10-06 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 353ab45cd4 ARROW-17684: [CI][deb] Disable Flight for arm64 (#14300)
 add 13ede7bb17 ARROW-16605: [CI][R] Fix revdep docker job (#13483)

No new revisions were added by this update.

Summary of changes:
 ci/scripts/r_revdepcheck.sh  | 58 
 dev/tasks/r/github.linux.revdepcheck.yml | 57 ---
 dev/tasks/tasks.yml  |  4 ---
 docker-compose.yml   | 11 +++---
 r/.Rbuildignore  |  1 +
 5 files changed, 57 insertions(+), 74 deletions(-)
 delete mode 100644 dev/tasks/r/github.linux.revdepcheck.yml



[arrow] branch master updated: MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878)

2022-08-23 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 78d586a458 MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878)
78d586a458 is described below

commit 78d586a45852b69c40b88a43d86a1c90efdf1e0d
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Tue Aug 23 23:00:56 2022 +0900

MINOR: [R][Docs] Fix the Rd file of `infer_type` (#13878)

Authored-by: SHIMA Tatsuya 
Signed-off-by: Jonathan Keane 
---
 r/man/infer_type.Rd | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/r/man/infer_type.Rd b/r/man/infer_type.Rd
index e340afa915..1bba272556 100644
--- a/r/man/infer_type.Rd
+++ b/r/man/infer_type.Rd
@@ -19,9 +19,6 @@ type(x)
 An arrow \link[=data-type]{data type}
 }
 \description{
-Infer the arrow Array type from an R object.
-}
-\details{
 \code{\link[=type]{type()}} is deprecated in favor of 
\code{\link[=infer_type]{infer_type()}}.
 }
 \examples{



[arrow] branch master updated: ARROW-17084: [R] Install the package before linting (#13620)

2022-08-02 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 51eb3c8adb ARROW-17084: [R] Install the package before linting (#13620)
51eb3c8adb is described below

commit 51eb3c8adb5742f8d0d05c2e371dfbc651499614
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Tue Aug 2 13:49:15 2022 +0100

ARROW-17084: [R] Install the package before linting (#13620)

The package should be installed before running `lintr::ling_package()` or 
`lintr::expect_lint_free()` (our case), otherwise we could encounter some false 
positives.

See https://github.com/r-lib/lintr/issues/352#issuecomment-587004345 and 
https://github.com/r-lib/lintr/issues/406#issuecomment-534601141


Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 .github/workflows/r.yml | 8 
 1 file changed, 8 insertions(+)

diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 4a9c605e3b..4f706e3e5b 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -327,6 +327,14 @@ jobs:
 shell: Rscript {0}
 working-directory: r
 run: |
+  Sys.setenv(
+RWINLIB_LOCAL = file.path(Sys.getenv("GITHUB_WORKSPACE"), "r", 
"windows", "libarrow.zip"),
+MAKEFLAGS = paste0("-j", parallel::detectCores()),
+ARROW_R_DEV = TRUE,
+"_R_CHECK_FORCE_SUGGESTS_" = FALSE
+  )
+  # we use pak for package installation since it is faster, safer and 
more convenient
+  pak::local_install()
   pak::pak("lintr")
   lintr::expect_lint_free()
   - name: Dump install logs



[arrow] branch master updated (036fdf2d03 -> 778d574b1a)

2022-07-29 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 036fdf2d03 ARROW-17246: [Packaging][deb][RPM] Don't use system 
jemalloc (#13739)
 add 778d574b1a ARROW-17166: [R] [CI] force_tests() cannot return TRUE 
(#13680)

No new revisions were added by this update.

Summary of changes:
 r/tests/testthat/helper-skip.R | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow-nanoarrow] branch jonkeane-patch-1 created (now 68c9380)

2022-07-13 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch jonkeane-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git


  at 68c9380  Minor typo fix

This branch includes the following new commits:

 new 68c9380  Minor typo fix

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow-nanoarrow] 01/01: Minor typo fix

2022-07-13 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch jonkeane-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git

commit 68c938035f51b18fa8e3f0ded079bcc8ef975c0a
Author: Jonathan Keane 
AuthorDate: Wed Jul 13 10:44:13 2022 -0500

Minor typo fix
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b1051a0..9750f30 100644
--- a/README.md
+++ b/README.md
@@ -77,5 +77,5 @@ requiring a library with a similar scope:
   along which a [mostly header-only C++ 
library](https://github.com/paleolimbot/geonanoarrowpp/tree/main/src/geoarrow/internal/arrow-hpp)
   was prototyped.
 - The [Arrow Database Connectivity](https://github.com/apache/arrow-adbc) C 
API, for which drivers
-  in theory can be written in C (which is currently difficult in practice 
because of there
+  in theory can be written in C (which is currently difficult in practice 
because there
   are few if any tools to help do this properly).



[arrow] branch master updated: ARROW-17059: [C++] Fix expression benchmark (#13584)

2022-07-12 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new b87e0c1dad ARROW-17059: [C++] Fix expression benchmark (#13584)
b87e0c1dad is described below

commit b87e0c1dad77c2d95fb979bce831a57d6ae60daa
Author: Sasha Krassovsky 
AuthorDate: Tue Jul 12 12:59:16 2022 -0800

ARROW-17059: [C++] Fix expression benchmark (#13584)

Authored-by: Sasha Krassovsky 
Signed-off-by: Jonathan Keane 
---
 cpp/src/arrow/compute/exec/expression_benchmark.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpp/src/arrow/compute/exec/expression_benchmark.cc 
b/cpp/src/arrow/compute/exec/expression_benchmark.cc
index 70aa509d2e..debd228498 100644
--- a/cpp/src/arrow/compute/exec/expression_benchmark.cc
+++ b/cpp/src/arrow/compute/exec/expression_benchmark.cc
@@ -80,8 +80,8 @@ static void ExecuteScalarExpressionOverhead(benchmark::State& 
state, Expression
   });
   std::vector inputs(num_batches);
   for (auto& batch : inputs) {
-batch = ExecBatch({Datum(ConstantArrayGenerator::Int64(rows_per_batch, 
5))},
-  /*length=*/1);
+batch = ExecBatch({Datum(ConstantArrayGenerator::Int64(rows_per_batch, 
/*value=*/5))},
+  /*length=*/rows_per_batch);
   }
 
   ASSIGN_OR_ABORT(auto bound, expr.Bind(*dataset_schema));



[arrow] branch master updated (5fae150493 -> 3c1caea36a)

2022-06-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 5fae150493 ARROW-16726: [Python] Fix Setuptools warnings about 
installing packages as data (#13309)
 add 3c1caea36a ARROW-16415: [R] Update `strptime` binding signature with 
the `tz` argument (#13190)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-datetime.R   | 50 +
 r/tests/testthat/test-dplyr-funcs-datetime.R | 67 +++-
 2 files changed, 86 insertions(+), 31 deletions(-)



[arrow] branch master updated: ARROW-16626: [C++] Name the C++ streaming execution engine

2022-06-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new bc4a82fd5b ARROW-16626: [C++] Name the C++ streaming execution engine
bc4a82fd5b is described below

commit bc4a82fd5b65d90e97b773ca728442f369eb9951
Author: Weston Pace 
AuthorDate: Wed Jun 1 17:26:14 2022 -0500

ARROW-16626: [C++] Name the C++ streaming execution engine

Closes #13207 from westonpace/feature/ARROW-16626--name-query-engine

Lead-authored-by: Weston Pace 
Co-authored-by: Will Jones 
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 docs/source/cpp/overview.rst|  3 +++
 docs/source/cpp/streaming_execution.rst | 39 +
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/docs/source/cpp/overview.rst b/docs/source/cpp/overview.rst
index ccebdba45d..33f075bd18 100644
--- a/docs/source/cpp/overview.rst
+++ b/docs/source/cpp/overview.rst
@@ -66,6 +66,9 @@ reference.
 **Kernels** are specialized computation functions running in a loop over a
 given set of datums representing input and output parameters to the functions.
 
+**Acero** (pronounced [aˈsɜɹo] / ah-SERR-oh) is a streaming execution engine 
that allows
+computation to be expressed as a graph of operators which can transform 
streams of data.
+
 The IO layer
 
 
diff --git a/docs/source/cpp/streaming_execution.rst 
b/docs/source/cpp/streaming_execution.rst
index 649968ad43..7ce25f587d 100644
--- a/docs/source/cpp/streaming_execution.rst
+++ b/docs/source/cpp/streaming_execution.rst
@@ -19,14 +19,13 @@
 .. highlight:: cpp
 .. cpp:namespace:: arrow::compute
 
-==
-Streaming execution engine
-==
+===
+Acero: A C++ streaming execution engine
+===
 
 .. warning::
 
-The streaming execution engine is experimental, and a stable API
-is not yet guaranteed.
+Acero is experimental and a stable API is not yet guaranteed.
 
 Motivation
 ==
@@ -35,20 +34,23 @@ For many complex computations, successive direct 
:ref:`invocation of
 compute functions ` is not feasible
 in either memory or computation time. Doing so causes all intermediate
 data to be fully materialized. To facilitate arbitrarily large inputs
-and more efficient resource usage, Arrow also provides a streaming query
-engine with which computations can be formulated and executed.
+and more efficient resource usage, the Arrow C++ implementation also
+provides Acero, a streaming query engine with which computations can
+be formulated and executed.
 
 .. image:: simple_graph.svg
:alt: An example graph of a streaming execution workflow.
 
-:class:`ExecNode` is provided to reify the graph of operations in a query.
-Batches of data (:struct:`ExecBatch`) flow along edges of the graph from
-node to node. Structuring the API around streams of batches allows the
-working set for each node to be tuned for optimal performance independent
-of any other nodes in the graph. Each :class:`ExecNode` processes batches
-as they are pushed to it along an edge of the graph by upstream nodes
-(its inputs), and pushes batches along an edge of the graph to downstream
-nodes (its outputs) as they are finalized.
+Acero allows computation to be expressed as an "execution plan"
+(:class:`ExecPlan`) which is a directed graph of operators.  Each operator
+(:class:`ExecNode`) provides, transforms, or consumes the data passing
+through it.  Batches of data (:struct:`ExecBatch`) flow along edges of
+the graph from node to node. Structuring the API around streams of batches
+allows the working set for each node to be tuned for optimal performance
+independent of any other nodes in the graph. Each :class:`ExecNode`
+processes batches as they are pushed to it along an edge of the graph by
+upstream nodes (its inputs), and pushes batches along an edge of the graph
+to downstream nodes (its outputs) as they are finalized.
 
 .. seealso::
 
@@ -366,10 +368,9 @@ This function might be reading a file, iterating through 
an in memory structure,
 from a network connection.  The arrow library refers to these functions as 
``arrow::AsyncGenerator``
 and there are a number of utilities for working with these functions.  For 
this example we use 
 a vector of record batches that we've already stored in memory.
-In addition, the schema of the data must be known up front.  Arrow's streaming 
execution
-engine must know the schema of the data at each stage of the execution graph 
before any
-processing has begun.  This means we must supply the schema for a source node 
separately
-from the data itself.
+In addition, the schema of the data must be known up front.  Acero must know 
the schema of the data
+at each stage of the

[arrow] branch master updated: ARROW-14632: [Python] Make write_dataset arguments keyword-only

2022-06-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ffc10a43b ARROW-14632: [Python] Make write_dataset arguments 
keyword-only
2ffc10a43b is described below

commit 2ffc10a43b2b9a397bfeba993993172082f9722b
Author: Austin Dickey 
AuthorDate: Wed Jun 1 17:22:17 2022 -0500

ARROW-14632: [Python] Make write_dataset arguments keyword-only

As a best practice, most of the optional configuration arguments in 
`write_dataset()` should be keyword-only. This PR enforces that.

Closes #13289 from austin3dickey/ARROW-14632

Authored-by: Austin Dickey 
Signed-off-by: Jonathan Keane 
---
 python/pyarrow/dataset.py| 2 +-
 python/pyarrow/tests/test_dataset.py | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/python/pyarrow/dataset.py b/python/pyarrow/dataset.py
index 6c1b8db5a6..8ef3e2f7aa 100644
--- a/python/pyarrow/dataset.py
+++ b/python/pyarrow/dataset.py
@@ -801,7 +801,7 @@ def _ensure_write_partitioning(part, schema, flavor):
 return part
 
 
-def write_dataset(data, base_dir, basename_template=None, format=None,
+def write_dataset(data, base_dir, *, basename_template=None, format=None,
   partitioning=None, partitioning_flavor=None, schema=None,
   filesystem=None, file_options=None, use_threads=True,
   max_partitions=None, max_open_files=None,
diff --git a/python/pyarrow/tests/test_dataset.py 
b/python/pyarrow/tests/test_dataset.py
index 0be01d2336..d2210c4b6c 100644
--- a/python/pyarrow/tests/test_dataset.py
+++ b/python/pyarrow/tests/test_dataset.py
@@ -1796,6 +1796,12 @@ def 
test_dictionary_partitioning_outer_nulls_raises(tempdir):
 ds.write_dataset(table, tempdir, format='ipc', partitioning=part)
 
 
+def test_positional_keywords_raises(tempdir):
+table = pa.table({'a': ['x', 'y', None], 'b': ['x', 'y', 'z']})
+with pytest.raises(TypeError):
+ds.write_dataset(table, tempdir, "basename-{i}.arrow")
+
+
 @pytest.mark.parquet
 @pytest.mark.pandas
 def test_read_partition_keys_only(tempdir):



[arrow] branch master updated: MINOR: [Docs] Update auto_disconnect parameter based on ARROW-14395

2022-06-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8295bdc2e8 MINOR: [Docs] Update auto_disconnect parameter based on 
ARROW-14395
8295bdc2e8 is described below

commit 8295bdc2e86e657c59724c3e56da474e5414cb39
Author: Will Jones 
AuthorDate: Wed Jun 1 17:20:14 2022 -0500

MINOR: [Docs] Update auto_disconnect parameter based on ARROW-14395

#11482 changed the default value of the parameter, but didn't update the 
docs for it.

Closes #13290 from wjones127/minor-duckdb-doc

Authored-by: Will Jones 
Signed-off-by: Jonathan Keane 
---
 r/R/duckdb.R   | 6 ++
 r/man/to_duckdb.Rd | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/r/R/duckdb.R b/r/R/duckdb.R
index 3951362f8e..b924dafcab 100644
--- a/r/R/duckdb.R
+++ b/r/R/duckdb.R
@@ -26,9 +26,7 @@
 #' If `auto_disconnect = TRUE`, the DuckDB table that is created will be 
configured
 #' to be unregistered when the `tbl` object is garbage collected. This is 
helpful
 #' if you don't want to have extra table objects in DuckDB after you've 
finished
-#' using them. Currently, this cleanup can, however, sometimes lead to hangs if
-#' tables are created and deleted in quick succession, hence the default value
-#' of `FALSE`
+#' using them.
 #'
 #' @param .data the Arrow object (e.g. Dataset, Table) to use for the DuckDB 
table
 #' @param con a DuckDB connection to use (default will create one and store it
@@ -36,7 +34,7 @@
 #' @param table_name a name to use in DuckDB for this object. The default is a
 #' unique string `"arrow_"` followed by numbers.
 #' @param auto_disconnect should the table be automatically cleaned up when the
-#' resulting object is removed (and garbage collected)? Default: `FALSE`
+#' resulting object is removed (and garbage collected)? Default: `TRUE`
 #'
 #' @return A `tbl` of the new table in DuckDB
 #'
diff --git a/r/man/to_duckdb.Rd b/r/man/to_duckdb.Rd
index 8d6a9e5c62..79c089239b 100644
--- a/r/man/to_duckdb.Rd
+++ b/r/man/to_duckdb.Rd
@@ -21,7 +21,7 @@ in \code{options("arrow_duck_con")})}
 unique string \code{"arrow_"} followed by numbers.}
 
 \item{auto_disconnect}{should the table be automatically cleaned up when the
-resulting object is removed (and garbage collected)? Default: \code{FALSE}}
+resulting object is removed (and garbage collected)? Default: \code{TRUE}}
 }
 \value{
 A \code{tbl} of the new table in DuckDB
@@ -37,9 +37,7 @@ The result is a dbplyr-compatible object that can be used in 
d(b)plyr pipelines.
 If \code{auto_disconnect = TRUE}, the DuckDB table that is created will be 
configured
 to be unregistered when the \code{tbl} object is garbage collected. This is 
helpful
 if you don't want to have extra table objects in DuckDB after you've finished
-using them. Currently, this cleanup can, however, sometimes lead to hangs if
-tables are created and deleted in quick succession, hence the default value
-of \code{FALSE}
+using them.
 }
 \examples{
 \dontshow{if (getFromNamespace("run_duckdb_examples", "arrow")()) (if 
(getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}



[arrow] branch master updated: ARROW-16281: [R] [CI] Bump versions with the release of 4.2

2022-05-18 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new ce4dcbdf5f ARROW-16281: [R] [CI] Bump versions with the release of 4.2
ce4dcbdf5f is described below

commit ce4dcbdf5f1bcfb3f23b494598b9125c8e7ee52e
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Wed May 18 13:28:23 2022 -0700

ARROW-16281: [R] [CI] Bump versions with the release of 4.2

Update hard-coded versions on R in our CI after the release of R 4.2.

Closes #12980 from dragosmg/r_42_ci_update

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 .env   |  2 +-
 .github/workflows/r.yml|  6 +++---
 ci/docker/linux-apt-docs.dockerfile|  2 +-
 ci/docker/linux-apt-lint.dockerfile| 11 +++
 ci/docker/linux-apt-r.dockerfile   |  9 ++---
 dev/tasks/r/github.linux.arrow.version.back.compat.yml |  1 +
 dev/tasks/r/github.linux.versions.yml  |  3 ++-
 dev/tasks/tasks.yml| 10 +-
 8 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/.env b/.env
index 5c73161ac1..f56820daad 100644
--- a/.env
+++ b/.env
@@ -68,7 +68,7 @@ NODE=16
 NUMPY=latest
 PANDAS=latest
 PYTHON=3.8
-R=4.1
+R=4.2
 SPARK=master
 TURBODBC=latest
 
diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 19abac5bb2..8de703b71a 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -57,7 +57,7 @@ jobs:
 strategy:
   fail-fast: false
   matrix:
-r: ["4.1"]
+r: ["4.2"]
 ubuntu: [20.04]
 force-tests: ["true", "false"]
 env:
@@ -244,7 +244,7 @@ jobs:
 config:
 - { rtools: 35, rversion: "3.6" }
 - { rtools: 40, rversion: "4.1" }
-# TODO: Once R 4.2 comes out we can switch to devel + 4.2
+- { rtools: 42, rversion: "4.2" }
 - { rtools: 42, rversion: "devel" }
 env:
   ARROW_R_CXXFLAGS: "-Werror"
@@ -384,7 +384,7 @@ jobs:
 timeout = 3600
   )
   - name: Run lintr
-if: ${{ matrix.config.rversion == '4.1' }}
+if: ${{ matrix.config.rversion == '4.2' }}
 env:
   NOT_CRAN: "true"
   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/ci/docker/linux-apt-docs.dockerfile 
b/ci/docker/linux-apt-docs.dockerfile
index 0ef1231321..3a8a9cf8e2 100644
--- a/ci/docker/linux-apt-docs.dockerfile
+++ b/ci/docker/linux-apt-docs.dockerfile
@@ -18,7 +18,7 @@
 ARG base
 FROM ${base}
 
-ARG r=4.1
+ARG r=4.2
 ARG jdk=8
 
 # See R install instructions at https://cloud.r-project.org/bin/linux/ubuntu/
diff --git a/ci/docker/linux-apt-lint.dockerfile 
b/ci/docker/linux-apt-lint.dockerfile
index 036be1ac13..249072ae32 100644
--- a/ci/docker/linux-apt-lint.dockerfile
+++ b/ci/docker/linux-apt-lint.dockerfile
@@ -40,16 +40,11 @@ RUN apt-get update && \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*
 
-ARG r=4.1
+ARG r=4.2
 RUN wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc 
| \
 tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \
-# NOTE: R 3.5 and 3.6 are available in the repos with -cran35 suffix
-# for trusty, xenial, bionic, and eoan (as of May 2020)
-# -cran40 has 4.0 versions for bionic and focal
-# R 3.4 is available without the suffix but only for trusty and xenial
-# TODO: make sure OS version and R version are valid together and 
conditionally set repo suffix
-# This is a hack to turn 3.6 into 35, and 4.0/4.1 into 40:
-add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
'$(lsb_release -cs)'-cran'$(echo "${r}" | tr -d . | tr 6 5 | tr 1 0)'/' && \
+# NOTE: Only R >= 4.0 is available in this repo
+add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
'$(lsb_release -cs)'-cran40/' && \
 apt-get install -y \
 r-base=${r}* \
 r-recommended=${r}* \
diff --git a/ci/docker/linux-apt-r.dockerfile b/ci/docker/linux-apt-r.dockerfile
index 7526f78452..7083bfa3d9 100644
--- a/ci/docker/linux-apt-r.dockerfile
+++ b/ci/docker/linux-apt-r.dockerfile
@@ -38,13 +38,8 @@ RUN apt-get update -y && \
 software-properties-common && \
 wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc 
| \
 tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc && \
-# NOTE: R 3.5 and 3.6 are available in the repos with -cran35 suffix
-# for trusty, xenial, bionic, and eoan (as of May 2020)
-# -cran40 has 4.0 versions for bionic an

[arrow] branch master updated (b264dca5a0 -> 214135d8ce)

2022-05-09 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from b264dca5a0 MINOR: [R] Move tzdb loading out of .onLoad() to avoid a 
check NOTE
 add 214135d8ce ARROW-14848: [R] Implement bindings for lubridate's 
parse_date_time

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  6 ++
 r/R/dplyr-datetime-helpers.R | 45 +
 r/R/dplyr-funcs-datetime.R   | 36 +++
 r/tests/testthat/test-dplyr-funcs-datetime.R | 96 +++-
 4 files changed, 182 insertions(+), 1 deletion(-)



[arrow] branch master updated: ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows

2022-04-29 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3c03d49364 ARROW-16073: [R] clean-up date time unit testing once tzdb 
is available on Windows
3c03d49364 is described below

commit 3c03d4936445781e29e41392d9a0bc3db62b39f2
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Fri Apr 29 17:34:09 2022 -0500

ARROW-16073: [R] clean-up date time unit testing once tzdb is available on 
Windows

Closes #12883 from dragosmg/datetime_unit_testing_cleanup

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 r/tests/testthat/test-dplyr-funcs-datetime.R | 5 -
 r/tests/testthat/test-dplyr-funcs-type.R | 7 +--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R 
b/r/tests/testthat/test-dplyr-funcs-datetime.R
index a4c5ee3c22..47626a6cb1 100644
--- a/r/tests/testthat/test-dplyr-funcs-datetime.R
+++ b/r/tests/testthat/test-dplyr-funcs-datetime.R
@@ -841,8 +841,6 @@ test_that("month() supports integer input", {
   test_df_month
 )
 
-skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
-
 compare_dplyr_binding(
   .input %>%
 # R returns ordered factor whereas Arrow returns character
@@ -904,8 +902,6 @@ test_that("month() errors with double input and returns NA 
with int outside 1:12
 })
 
 test_that("date works in arrow", {
-  # https://issues.apache.org/jira/browse/ARROW-13168
-  skip_on_os("windows")
   # this date is specific since lubridate::date() is different from 
base::as.Date()
   # since as.Date returns the UTC date and date() doesn't
   test_df <- tibble(
@@ -1123,7 +1119,6 @@ test_that("difftime works correctly", {
 ignore_attr = TRUE
   )
 
-  skip_on_os("windows")
   test_df_with_tz <- tibble(
 time1 = as.POSIXct(
   c("2021-02-20", "2021-07-31", "2021-10-30", "2021-01-31"),
diff --git a/r/tests/testthat/test-dplyr-funcs-type.R 
b/r/tests/testthat/test-dplyr-funcs-type.R
index 6a07d36e81..e4283e39b5 100644
--- a/r/tests/testthat/test-dplyr-funcs-type.R
+++ b/r/tests/testthat/test-dplyr-funcs-type.R
@@ -873,7 +873,6 @@ test_that("`as.Date()` and `as_date()`", {
 fixed = TRUE
   )
 
-
   # we do not support as.Date() with double/ float (error surfaced from C++)
   # TODO revisit after https://issues.apache.org/jira/browse/ARROW-15798
   expect_error(
@@ -958,7 +957,11 @@ test_that("`as_datetime()`", {
 })
 
 test_that("format date/time", {
-  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+  # locale issues
+  # TODO revisit after https://issues.apache.org/jira/browse/ARROW-16399 is 
done
+  if (tolower(Sys.info()[["sysname"]]) == "windows") {
+withr::local_locale(LC_TIME = "C")
+  }
   # In 3.4 the lack of tzone attribute causes spurious failures
   skip_if_r_version("3.4.4")
 



[arrow] branch master updated: ARROW-16373: [Docs][CI] Small improvements to CI documentation

2022-04-27 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new e0061bbb8c ARROW-16373: [Docs][CI] Small improvements to CI 
documentation
e0061bbb8c is described below

commit e0061bbb8cd269f4e8b880d1a4ba181312cbc07f
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Wed Apr 27 17:24:47 2022 -0500

ARROW-16373: [Docs][CI] Small improvements to CI documentation

Closes #12989 from dragosmg/patch-1

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 .../developers/continuous_integration/overview.rst | 14 +++---
 .../developers/guide/step_by_step/arrow_codebase.rst   | 12 ++--
 .../developers/guide/step_by_step/pr_lifecycle.rst | 18 +-
 r/vignettes/developers/workflow.Rmd|  4 ++--
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/docs/source/developers/continuous_integration/overview.rst 
b/docs/source/developers/continuous_integration/overview.rst
index 73aef370cd..3c21c17063 100644
--- a/docs/source/developers/continuous_integration/overview.rst
+++ b/docs/source/developers/continuous_integration/overview.rst
@@ -31,18 +31,18 @@ Some files central to Arrow CI are:
 
 We use :ref:`Docker` in order to have portable and reproducible 
Linux builds, as well as running Windows builds in Windows containers.  We use 
:ref:`Archery` and :ref:`Crossbow` to help co-ordinate the 
various CI tasks.
 
-One thing to note is the some of the services defined in 
``docker-compose.yml`` are interdependent.  When running services locally, you 
must either manually build its dependencies first, or build it via the use of 
``archery run ...`` which automatically finds and builds dependencies. 
+One thing to note is the some of the services defined in 
``docker-compose.yml`` are interdependent.  When running services locally, you 
must either manually build its dependencies first, or build it via the use of 
``archery run ...`` which automatically finds and builds dependencies.
 
 There are numerous important directories in the Arrow project which relate to 
CI:
 
 - ``.github/worflows`` - workflows that are run via GitHub actions and are 
triggered by things like pull requests being submitted or merged
-- ``dev/tasks`` - containing on-demand jobs triggered/submitted via ``archery 
crossbow submit ...``, typically nightly builds or relating to the release 
process
+- ``dev/tasks`` - containing extended jobs triggered/submitted via ``archery 
crossbow submit ...``, typically nightly builds or relating to the release 
process
 - ``ci/`` - containing scripts, dockerfiles, and any supplemental files, e.g. 
patch files, conda environment files, vcpkg triplet files.
 
 Instead of thinking about Arrow CI in terms of files and folders, it may be 
conceptually simpler to instead divide it into 2 main categories:
 
-- CI jobs which are triggered based on specific actions on GitHub (pull 
requests opened, pull requests merged, etc)
-- On-demand builds which are manually triggered on a nightly basis or via 
Archery
+- **action-triggered builds**: CI jobs which are triggered based on specific 
actions on GitHub (pull requests opened, pull requests merged, etc)
+- **extended builds**: manually triggered with many being run on a nightly 
basis
 
 Action-triggered builds
 ---
@@ -61,9 +61,9 @@ The ``.yml`` files in ``.github/worflows`` are workflows 
which are run on GitHub
 There are two other files which define action-triggered builds:
 
 - ``.travis.yml`` - runs on all commits and is used to test on architectures 
such as ARM and S390x
-- ``appveyor.yml`` - runs on commits related to Python or C++ 
+- ``appveyor.yml`` - runs on commits related to Python or C++
 
-On-demand builds
+Extended builds
 ---
 
 Crossbow is a subcomponent of Archery and can be used to manually trigger 
builds.  The tasks which can be run on Crossbow can be found in the 
``dev/tasks`` directory.  This directory contains:
@@ -73,4 +73,4 @@ Crossbow is a subcomponent of Archery and can be used to 
manually trigger builds
 
 Most of these tasks are run as part of the nightly builds, though also can be 
triggered manually by add a comment to a PR which begins with ``@github-actions 
crossbow submit`` followed by the name of the task to be run.
 
-For convenience purpose, the tasks in ``dev/tasks/tasks.yml`` are defined in 
groups, which makes it simpler for multiple tasks to be submitted to Crossbow 
at once.  The task definitions here contain information about which service 
defined in ``docker-compose.yml`` to run, the CI service to run the task on, 
and which template file to use as the basis for that task.
\ No newline at end of file
+For convenience purpose, the tasks in ``dev/tasks/tasks.yml`` are defined in 
groups, which

[arrow] branch master updated (24f372297c -> d92777270b)

2022-04-27 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 24f372297c ARROW-16294: [C++] Improve performance of parquet readahead
 add d92777270b ARROW-16325: [R] Add task for R package with gcc12

No new revisions were added by this update.

Summary of changes:
 dev/tasks/tasks.yml | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)



[arrow] branch master updated: ARROW-16374: [R] [C++] skip another snappy test during sanitizer runs

2022-04-27 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f03f090d0c ARROW-16374: [R] [C++] skip another snappy test during 
sanitizer runs
f03f090d0c is described below

commit f03f090d0c9fd6c85e046e2790c5d443729f6b30
Author: Jonathan Keane 
AuthorDate: Wed Apr 27 15:14:58 2022 -0500

ARROW-16374: [R] [C++] skip another snappy test during sanitizer runs

Another example of https://github.com/google/snappy/pull/148

Closes #13014 from jonkeane/ARROW-16374

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/tests/testthat/test-parquet.R | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/r/tests/testthat/test-parquet.R b/r/tests/testthat/test-parquet.R
index dbafd5d62c..1737b7100c 100644
--- a/r/tests/testthat/test-parquet.R
+++ b/r/tests/testthat/test-parquet.R
@@ -197,6 +197,8 @@ test_that("Maps are preserved when writing/reading from 
Parquet", {
 })
 
 test_that("read_parquet() and write_parquet() accept connection objects", {
+  skip_if_not_available("snappy")
+
   tf <- tempfile()
   on.exit(unlink(tf))
 



[arrow-site] branch master updated: ARROW-16244: [Website] Arrow for R cheatsheet blog post (#204)

2022-04-27 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/master by this push:
 new e7fe76ae3f ARROW-16244: [Website] Arrow for R cheatsheet blog post 
(#204)
e7fe76ae3f is described below

commit e7fe76ae3f92be639fe0ccf7d2a6d0fe47cf775e
Author: Stephanie Hazlitt 
AuthorDate: Wed Apr 27 07:39:25 2022 -0700

ARROW-16244: [Website] Arrow for R cheatsheet blog post (#204)

* add cheatsheet thumbnail png

* add cheatsheet post md file

* use local contributors for author in yaml

* tweak image+header sizes

* centre png

* edit and use r cookbook url

Co-authored-by: Nicola Crane 

* minor edit

Co-authored-by: Nic Crane 

* add Apache to title

Co-authored-by: Neal Richardson 

* add Apache in first ref

Co-authored-by: Neal Richardson 

* bump publish date

* update img date

* update filename date

* update thumbnail url

Co-authored-by: Nicola Crane 
Co-authored-by: Neal Richardson 
---
 _data/contributors.yml|   3 ++
 _posts/2022-04-27-arrow-r-cheatsheet.md   |  49 ++
 img/20220427-arrow-r-cheatsheet-thumbnail.png | Bin 0 -> 814228 bytes
 3 files changed, 52 insertions(+)

diff --git a/_data/contributors.yml b/_data/contributors.yml
index 3471b50c67..0a01adb255 100644
--- a/_data/contributors.yml
+++ b/_data/contributors.yml
@@ -55,4 +55,7 @@
 - name: Ruan Pearce-Authers
   apacheId: ruanpa # Not a real apacheId
   githubId: returnString
+- name: Stephanie Hazlitt
+  apacheId: stephhazlitt
+  githubId: stephhazlitt
 # End contributors.yml
diff --git a/_posts/2022-04-27-arrow-r-cheatsheet.md 
b/_posts/2022-04-27-arrow-r-cheatsheet.md
new file mode 100644
index 00..3e7651097d
--- /dev/null
+++ b/_posts/2022-04-27-arrow-r-cheatsheet.md
@@ -0,0 +1,49 @@
+---
+layout: post
+title: Apache Arrow for R Cheatsheet
+date: "2022-04-27 00:00:00"
+author: stephhazlitt
+categories: [application]
+---
+
+
+We are excited to introduce the new [Apache Arrow for R 
Cheatsheet](https://github.com/apache/arrow/blob/master/r/cheatsheet/arrow-cheatsheet.pdf).
+
+
+https://github.com/apache/arrow/blob/master/r/cheatsheet/arrow-cheatsheet.pdf;>
+
+
+
+
+## Helping (Not Cheating)
+
+While [cheatsheets](https://en.wikipedia.org/wiki/Cheat_sheet) may have 
started as a set of notes used without an instructor’s knowledgeso, 
ummm, cheatingusing the Arrow for R cheatsheet is definitely not 
cheating! Today, cheatsheets are a common tool to provide users an introduction 
to software’s functionality and a quick reference guide to help users get 
started.
+
+The Arrow for R cheatsheet is intended to be an easy-to-scan introduction to 
the Arrow R package and Arrow data structures, with getting started sections on 
some of the package’s main functionality. The cheatsheet includes introductory 
snippets on using Arrow to read and work with larger-than-memory multi-file 
data sets, sending and receiving data with Flight, reading data from cloud 
storage without downloading the data first, and more. The Arrow for R 
cheatsheet also directs users to th [...]
+
+## Cheatsheet Maintenance
+
+See something that needs updating? Or want to suggest a change? Like software 
itself, a package cheatsheet needs maintenance to keep pace with new features 
or user-facing changes. Contributions can be made by downloading and making 
changes to the [`arrow-cheatsheet.pptx` 
file](https://github.com/apache/arrow/tree/master/r/cheatsheet) (in Microsoft 
PowerPoint or Google Slides), and offering the revised `.pptx` and rendered PDF 
back to the project following the _new_ [New Contributors Guid [...]
+
+## By the Community For the Community
+
+The Arrow for R cheatsheet was initiated by Mauricio (Pachá) Vargas Sepúlveda 
([ARROW-13616](https://issues.apache.org/jira/browse/ARROW-13616)) and was 
co-developed and reviewed by many Apache Arrow community members. The 
cheatsheet was created by the community for the community, and anyone in the 
Arrow community is welcome and encouraged to help with maintenance and offer 
improvements. Thank you for your support!
\ No newline at end of file
diff --git a/img/20220427-arrow-r-cheatsheet-thumbnail.png 
b/img/20220427-arrow-r-cheatsheet-thumbnail.png
new file mode 100644
index 00..ecd3b0d763
Binary files /dev/null and b/img/20220427-arrow-r-cheatsheet-thumbnail.png 
differ



[arrow] branch master updated (a16be6b7b6 -> e1e782a454)

2022-04-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from a16be6b7b6 ARROW-16121: [Python] Deprecate the 
(common_)metadata(_path) attributes of ParquetDataset
 add e1e782a454 ARROW-15015: [R] Test / CI flag for ensuring all tests are 
run?

No new revisions were added by this update.

Summary of changes:
 .github/workflows/r.yml|  5 +++--
 ci/scripts/r_test.sh   |  6 ++
 r/tests/testthat/helper-skip.R | 34 ++
 3 files changed, 43 insertions(+), 2 deletions(-)



[arrow] branch master updated: ARROW-15800 [R] Implement bindings for `lubridate::as_date()` and `lubridate::as_datetime()`

2022-04-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 16638a4452 ARROW-15800 [R] Implement bindings for 
`lubridate::as_date()` and `lubridate::as_datetime()`
16638a4452 is described below

commit 16638a445201e7bf61358c96a6e70ab81df8a001
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Fri Apr 22 11:08:59 2022 -0500

ARROW-15800 [R] Implement bindings for `lubridate::as_date()` and 
`lubridate::as_datetime()`

Closes #12738 from dragosmg/as_date_as_datetime_take2

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 r/NEWS.md|   1 +
 r/R/dplyr-funcs-datetime.R   |  61 +-
 r/R/dplyr-funcs-type.R   |  83 +++
 r/man/arrow-package.Rd   |   4 +-
 r/tests/testthat/test-dplyr-funcs-type.R | 132 +--
 5 files changed, 222 insertions(+), 59 deletions(-)

diff --git a/r/NEWS.md b/r/NEWS.md
index eb5cd9a155..71a4d0be73 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -27,6 +27,7 @@
   * Added `make_difftime()` (duration constructor)
   * Added duration helper functions: `dyears()`, `dmonths()`, `dweeks()`, 
`ddays()`, `dhours()`, `dminutes()`, `dseconds()`, `dmilliseconds()`, 
`dmicroseconds()`, `dnanoseconds()`.
 * date-time functionality:
+  * Added `as_date()` and `as_datetime()`
   * Added `difftime` and `as.difftime()` 
   * Added `as.Date()` to convert to date
 * `median()` and `quantile()` will warn once about approximate calculations 
regardless of interactivity.
diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R
index a674a6402b..a6bc79ec7c 100644
--- a/r/R/dplyr-funcs-datetime.R
+++ b/r/R/dplyr-funcs-datetime.R
@@ -263,11 +263,11 @@ register_bindings_duration <- function() {
 # cast to timestamp if time1 and time2 are not dates or timestamp 
expressions
 # (the subtraction of which would output a `duration`)
 if (!call_binding("is.instant", time1)) {
-  time1 <- build_expr("cast", time1, options = cast_options(to_type = 
timestamp(timezone = "UTC")))
+  time1 <- build_expr("cast", time1, options = cast_options(to_type = 
timestamp()))
 }
 
 if (!call_binding("is.instant", time2)) {
-  time2 <- build_expr("cast", time2, options = cast_options(to_type = 
timestamp(timezone = "UTC")))
+  time2 <- build_expr("cast", time2, options = cast_options(to_type = 
timestamp()))
 }
 
 # if time1 or time2 are timestamps they cannot be expressed in "s" /seconds
@@ -476,3 +476,60 @@ duration_from_chunks <- function(chunks) {
   }
   duration
 }
+
+binding_as_date <- function(x,
+format = NULL,
+tryFormats = "%Y-%m-%d",
+origin = "1970-01-01") {
+
+  if (is.null(format) && length(tryFormats) > 1) {
+abort("`as.Date()` with multiple `tryFormats` is not supported in Arrow")
+  }
+
+  if (call_binding("is.Date", x)) {
+return(x)
+
+# cast from character
+  } else if (call_binding("is.character", x)) {
+x <- binding_as_date_character(x, format, tryFormats)
+
+# cast from numeric
+  } else if (call_binding("is.numeric", x)) {
+x <- binding_as_date_numeric(x, origin)
+  }
+
+  build_expr("cast", x, options = cast_options(to_type = date32()))
+}
+
+binding_as_date_character <- function(x,
+  format = NULL,
+  tryFormats = "%Y-%m-%d") {
+  format <- format %||% tryFormats[[1]]
+  # unit = 0L is the identifier for seconds in valid_time32_units
+  build_expr("strptime", x, options = list(format = format, unit = 0L))
+}
+
+binding_as_date_numeric <- function(x, origin = "1970-01-01") {
+
+  # Arrow does not support direct casting from double to date32(), but for
+  # integer-like values we can go via int32()
+  # https://issues.apache.org/jira/browse/ARROW-15798
+  # TODO revisit if arrow decides to support double -> date casting
+  if (!call_binding("is.integer", x)) {
+x <- build_expr("cast", x, options = cast_options(to_type = int32()))
+  }
+
+  if (origin != "1970-01-01") {
+delta_in_sec <- call_binding("difftime", origin, "1970-01-01")
+# TODO: revisit once either of these issues is addressed:
+#   https://issues.apache.org/jira/browse/ARROW-16253 (helper function for
+#   casting from double to duration) or
+#   https://issues.apache.org/jira/browse/ARROW-15862 (casting from int32
+#   -> duration or doub

[arrow] branch master updated (0ce8ce8b19 -> c4b646e715)

2022-04-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 0ce8ce8b19 ARROW-11415: [R] map_batches wouldn't accept a dataset as 
an argument
 add c4b646e715 ARROW-14942: [R] Bindings for lubridate's dpicoseconds, 
dnanoseconds, desconds, dmilliseconds, dmicroseconds

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  1 +
 r/R/dplyr-funcs-datetime.R   | 63 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R | 80 +++-
 3 files changed, 118 insertions(+), 26 deletions(-)



[arrow] branch master updated: ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch Linux

2022-04-21 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f7bccc51cc ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch 
Linux
f7bccc51cc is described below

commit f7bccc51cc8ab384134ee50a8dd0af03d937e8cd
Author: Jonathan Keane 
AuthorDate: Thu Apr 21 16:50:14 2022 -0500

ARROW-14638: [C++][R] Unknown C compiler / ccache on Arch Linux

Closes #11666 from jonkeane/ARROW-14638-ccache

Lead-authored-by: Jonathan Keane 
Co-authored-by: Neal Richardson 
Signed-off-by: Jonathan Keane 
---
 .env |  6 --
 ci/docker/linux-r.dockerfile |  3 +++
 ci/scripts/r_docker_configure.sh | 24 
 dev/tasks/r/azure.linux.yml  |  1 +
 dev/tasks/tasks.yml  |  9 +
 docker-compose.yml   |  1 +
 r/tools/nixlibs.R| 10 --
 7 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/.env b/.env
index a972654497..629dd04980 100644
--- a/.env
+++ b/.env
@@ -73,11 +73,13 @@ SPARK=master
 TURBODBC=latest
 
 # These correspond to images on Docker Hub that contain R, e.g. 
rhub/ubuntu-gcc-release:latest
-ARROW_R_DEV=TRUE
 R_IMAGE=ubuntu-gcc-release
 R_ORG=rhub
-R_PRUNE_DEPS=FALSE
 R_TAG=latest
+
+# Env vars for R builds
+ARROW_R_DEV=TRUE
+R_PRUNE_DEPS=FALSE
 TZ=UTC
 
 # -1 does not attempt to install a devtoolset version, any positive integer 
will install devtoolset-n
diff --git a/ci/docker/linux-r.dockerfile b/ci/docker/linux-r.dockerfile
index 1cbde3207e..804fb09f09 100644
--- a/ci/docker/linux-r.dockerfile
+++ b/ci/docker/linux-r.dockerfile
@@ -33,6 +33,9 @@ ENV DEVTOOLSET_VERSION=${devtoolset_version}
 ARG r_prune_deps=FALSE
 ENV R_PRUNE_DEPS=${r_prune_deps}
 
+ARG r_custom_ccache=false
+ENV R_CUSTOM_CCACHE=${r_custom_ccache}
+
 ARG tz="UTC"
 ENV TZ=${tz}
 
diff --git a/ci/scripts/r_docker_configure.sh b/ci/scripts/r_docker_configure.sh
index 518df1040d..9f93ba2b61 100755
--- a/ci/scripts/r_docker_configure.sh
+++ b/ci/scripts/r_docker_configure.sh
@@ -42,6 +42,30 @@ else
   apt-get update
 fi
 
+# Enable ccache if requested based on 
http://dirk.eddelbuettel.com/blog/2017/11/27/
+: ${R_CUSTOM_CCACHE:=FALSE}
+R_CUSTOM_CCACHE=`echo $R_CUSTOM_CCACHE | tr '[:upper:]' '[:lower:]'`
+if [ ${R_CUSTOM_CCACHE} = "true" ]; then
+  # install ccache
+  $PACKAGE_MANAGER install -y epel-release || true
+  $PACKAGE_MANAGER install -y ccache
+
+  mkdir -p ~/.R
+  echo "VER=
+CCACHE=ccache
+CC=\$(CCACHE) gcc\$(VER)
+CXX=\$(CCACHE) g++\$(VER)
+CXX11=\$(CCACHE) g++\$(VER)" >> ~/.R/Makevars
+
+  mkdir -p ~/.ccache/
+  echo "max_size = 5.0G
+# important for R CMD INSTALL *.tar.gz as tarballs are expanded freshly -> 
fresh ctime
+sloppiness = include_file_ctime
+# also important as the (temp.) directory name will differ
+hash_dir = false" >> ~/.ccache/ccache.conf
+fi
+
+
 # Special hacking to try to reproduce quirks on fedora-clang-devel on CRAN
 # which uses a bespoke clang compiled to use libc++
 # https://www.stats.ox.ac.uk/pub/bdr/Rconfig/r-devel-linux-x86_64-fedora-clang
diff --git a/dev/tasks/r/azure.linux.yml b/dev/tasks/r/azure.linux.yml
index 50b27aa7be..fd48141961 100644
--- a/dev/tasks/r/azure.linux.yml
+++ b/dev/tasks/r/azure.linux.yml
@@ -43,6 +43,7 @@ jobs:
   export R_IMAGE={{ r_image }}
   export R_TAG={{ r_tag }}
   export DEVTOOLSET_VERSION={{ devtoolset_version|default("-1") }}
+  export R_CUSTOM_CCACHE={{ r_custom_ccache|default("false") }}
   docker-compose pull --ignore-pull-failures r
   docker-compose build r
 displayName: Docker build
diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml
index b45dec61ff..a6a41ca274 100644
--- a/dev/tasks/tasks.yml
+++ b/dev/tasks/tasks.yml
@@ -1279,6 +1279,15 @@ tasks:
 template: r/github.linux.offline.build.yml
 
 
+  test-r-rhub-debian-gcc-release-custom-ccache:
+ci: azure
+template: r/azure.linux.yml
+params:
+  r_org: rhub
+  r_image: debian-gcc-release
+  r_tag: latest
+  r_custom_ccache: true
+
 {% for r_org, r_image, r_tag in [("rhub", "ubuntu-gcc-release", "latest"),
  ("rocker", "r-base", "latest"),
  ("rstudio", "r-base", "4.1-focal"),
diff --git a/docker-compose.yml b/docker-compose.yml
index f3c67fc4af..cff1a1665c 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1209,6 +1209,7 @@ services:
 devtoolset_version: ${DEVTOOLSET_VERSION}
 tz: ${TZ}
 r_prune_deps: ${R_PRUNE_DEPS}
+r_custom_ccache: ${R_CUSTOM_CCACHE}
 shm_size: *shm-size
 environment:
   LIBARROW_DOWNL

[arrow] branch master updated: ARROW-12659: [C++] Support is_valid as a guarantee

2022-04-21 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e03af446c ARROW-12659: [C++] Support is_valid as a guarantee
0e03af446c is described below

commit 0e03af446c328d0ef963510c3292cb14e092b917
Author: David Li 
AuthorDate: Thu Apr 21 13:55:15 2022 -0500

ARROW-12659: [C++] Support is_valid as a guarantee

This rebases #10253 and fixes it up to also address ARROW-15312, including 
a regression test.

This refactors how inequalities, is_valid, and is_null are treated in 
expression simplification, and updates the guarantees that the Parquet/Datasets 
emits for row groups to properly reflect nullability.

Closes #12891 from lidavidm/arrow-12659

Lead-authored-by: David Li 
Co-authored-by: Benjamin Kietzman 
Co-authored-by: Antoine Pitrou 
Signed-off-by: Jonathan Keane 
---
 cpp/src/arrow/compute/exec/expression.cc   | 405 +++--
 cpp/src/arrow/compute/exec/expression.h|   7 +-
 cpp/src/arrow/compute/exec/expression_test.cc  | 137 ++-
 cpp/src/arrow/compute/kernels/scalar_validity.cc   |  72 +++-
 .../arrow/compute/kernels/scalar_validity_test.cc  |  19 +
 cpp/src/arrow/dataset/file_csv_test.cc |   1 +
 cpp/src/arrow/dataset/file_ipc_test.cc |   1 +
 cpp/src/arrow/dataset/file_orc_test.cc |   1 +
 cpp/src/arrow/dataset/file_parquet.cc  |  26 +-
 cpp/src/arrow/dataset/file_parquet_test.cc |  21 +-
 cpp/src/arrow/dataset/test_util.h  |  26 +-
 cpp/src/arrow/type.h   |   2 +-
 cpp/src/arrow/util/stl_util_test.cc|   7 +
 cpp/src/arrow/util/vector.h|   4 +-
 docs/source/cpp/compute.rst|   9 +-
 docs/source/python/api/compute.rst |   1 +
 16 files changed, 570 insertions(+), 169 deletions(-)

diff --git a/cpp/src/arrow/compute/exec/expression.cc 
b/cpp/src/arrow/compute/exec/expression.cc
index 1ef5c6e7b9..8f7a9a1c8c 100644
--- a/cpp/src/arrow/compute/exec/expression.cc
+++ b/cpp/src/arrow/compute/exec/expression.cc
@@ -34,6 +34,7 @@
 #include "arrow/util/optional.h"
 #include "arrow/util/string.h"
 #include "arrow/util/value_parsing.h"
+#include "arrow/util/vector.h"
 
 namespace arrow {
 
@@ -110,7 +111,7 @@ namespace {
 
 std::string PrintDatum(const Datum& datum) {
   if (datum.is_scalar()) {
-if (!datum.scalar()->is_valid) return "null";
+if (!datum.scalar()->is_valid) return "null[" + datum.type()->ToString() + 
"]";
 
 switch (datum.type()->id()) {
   case Type::STRING:
@@ -129,6 +130,8 @@ std::string PrintDatum(const Datum& datum) {
 }
 
 return datum.scalar()->ToString();
+  } else if (datum.is_array()) {
+return "Array[" + datum.type()->ToString() + "]";
   }
   return datum.ToString();
 }
@@ -305,19 +308,49 @@ bool Expression::IsNullLiteral() const {
   return false;
 }
 
-bool Expression::IsSatisfiable() const {
-  if (type() && type()->id() == Type::NA) {
-return false;
+namespace {
+util::optional GetNullHandling(
+const Expression::Call& call) {
+  DCHECK_NE(call.function, nullptr);
+  if (call.function->kind() == compute::Function::SCALAR) {
+return static_cast(call.kernel)->null_handling;
   }
+  return util::nullopt;
+}
+}  // namespace
+
+bool Expression::IsSatisfiable() const {
+  if (!type()) return true;
+  if (type()->id() != Type::BOOL) return true;
 
   if (auto lit = literal()) {
 if (lit->null_count() == lit->length()) {
   return false;
 }
 
-if (lit->is_scalar() && lit->type()->id() == Type::BOOL) {
+if (lit->is_scalar()) {
   return lit->scalar_as().value;
 }
+
+return true;
+  }
+
+  if (field_ref()) return true;
+
+  auto call = CallNotNull(*this);
+
+  // invert(true_unless_null(x)) is always false or null by definition
+  // true_unless_null arises in simplification of inequalities below
+  if (call->function_name == "invert") {
+if (auto nested_call = call->arguments[0].call()) {
+  if (nested_call->function_name == "true_unless_null") return false;
+}
+  }
+
+  if (call->function_name == "and_kleene" || call->function_name == "and") {
+for (const Expression& arg : call->arguments) {
+  if (!arg.IsSatisfiable()) return false;
+}
   }
 
   return true;
@@ -370,9 +403,11 @@ Result BindNonRecursive(Expression::Call call, 
bool insert_implicit_
 
   compute::KernelContext kernel_context(exec_context);
   if (call.kernel->init) {
+const FunctionOptions* options =
+call.options ? call.opti

[arrow] branch master updated (20bc63a820 -> c73870acdc)

2022-04-20 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 20bc63a820 ARROW-16242: [Go] xerrors.Errorf and xerrors.Is are 
deprecated, fix linting
 add c73870acdc ARROW-15092: [R] Support 
create_package_with_all_dependencies() on non-linux systems

No new revisions were added by this update.

Summary of changes:
 cpp/thirdparty/download_dependencies.sh|  8 ---
 r/.gitignore   |  1 +
 r/R/install-arrow.R| 25 --
 .../tools/download_dependencies_R.sh   | 19 
 4 files changed, 35 insertions(+), 18 deletions(-)
 copy cpp/thirdparty/download_dependencies.sh => 
r/tools/download_dependencies_R.sh (74%)



[arrow] branch master updated (1f43abc933 -> 7ae86de86b)

2022-04-20 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 1f43abc933 ARROW-16074: [Docs] Document joins
 add 7ae86de86b ARROW-14944 [R] Implement `lubridate::make_difftime()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  1 +
 r/R/dplyr-funcs-datetime.R   | 66 +
 r/R/dplyr-funcs.R|  1 +
 r/tests/testthat/test-dplyr-funcs-datetime.R | 89 
 4 files changed, 157 insertions(+)



[arrow] branch master updated: ARROW-15517: [R] Use WriteNode in write_dataset()

2022-04-19 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b3f4677b9 ARROW-15517: [R] Use WriteNode in write_dataset()
4b3f4677b9 is described below

commit 4b3f4677b995cb7263e4a4e65daf00189f638617
Author: Neal Richardson 
AuthorDate: Tue Apr 19 16:40:57 2022 -0500

ARROW-15517: [R] Use WriteNode in write_dataset()

This should allow streaming writes in more cases, e.g. with a join.

Closes #12316 from nealrichardson/write-node

Authored-by: Neal Richardson 
Signed-off-by: Jonathan Keane 
---
 r/R/arrowExports.R|  8 ++--
 r/R/dataset-format.R  |  4 +-
 r/R/dataset-write.R   | 87 +++
 r/R/dplyr.R   | 11 -
 r/R/metadata.R| 22 -
 r/R/parquet.R | 38 ---
 r/R/query-engine.R| 29 +---
 r/src/arrowExports.cpp| 62 +
 r/src/compute-exec.cpp| 49 ++--
 r/src/dataset.cpp | 24 --
 r/tests/testthat/test-dataset-write.R | 70 
 r/tests/testthat/test-metadata.R  | 36 ---
 12 files changed, 291 insertions(+), 149 deletions(-)

diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 7bf77f1e66..6b969336c9 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -420,6 +420,10 @@ ExecNode_Scan <- function(plan, dataset, filter, 
materialized_field_names) {
   .Call(`_arrow_ExecNode_Scan`, plan, dataset, filter, 
materialized_field_names)
 }
 
+ExecPlan_Write <- function(plan, final_node, metadata, file_write_options, 
filesystem, base_dir, partitioning, basename_template, existing_data_behavior, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group) {
+  invisible(.Call(`_arrow_ExecPlan_Write`, plan, final_node, metadata, 
file_write_options, filesystem, base_dir, partitioning, basename_template, 
existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, 
min_rows_per_group, max_rows_per_group))
+}
+
 ExecNode_Filter <- function(input, filter) {
   .Call(`_arrow_ExecNode_Filter`, input, filter)
 }
@@ -748,10 +752,6 @@ dataset___Scanner__schema <- function(sc) {
   .Call(`_arrow_dataset___Scanner__schema`, sc)
 }
 
-dataset___Dataset__Write <- function(file_write_options, filesystem, base_dir, 
partitioning, basename_template, scanner, existing_data_behavior, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group) {
-  invisible(.Call(`_arrow_dataset___Dataset__Write`, file_write_options, 
filesystem, base_dir, partitioning, basename_template, scanner, 
existing_data_behavior, max_partitions, max_open_files, max_rows_per_file, 
min_rows_per_group, max_rows_per_group))
-}
-
 dataset___Scanner__TakeRows <- function(scanner, indices) {
   .Call(`_arrow_dataset___Scanner__TakeRows`, scanner, indices)
 }
diff --git a/r/R/dataset-format.R b/r/R/dataset-format.R
index f00efd0350..acc1a41b02 100644
--- a/r/R/dataset-format.R
+++ b/r/R/dataset-format.R
@@ -390,7 +390,7 @@ ParquetFragmentScanOptions$create <- 
function(use_buffered_stream = FALSE,
 FileWriteOptions <- R6Class("FileWriteOptions",
   inherit = ArrowObject,
   public = list(
-update = function(table, ...) {
+update = function(column_names, ...) {
   check_additional_args <- function(format, passed_args) {
 if (format == "parquet") {
   supported_args <- names(formals(write_parquet))
@@ -437,7 +437,7 @@ FileWriteOptions <- R6Class("FileWriteOptions",
   if (self$type == "parquet") {
 dataset___ParquetFileWriteOptions__update(
   self,
-  ParquetWriterProperties$create(table, ...),
+  ParquetWriterProperties$create(column_names, ...),
   ParquetArrowWriterProperties$create(...)
 )
   } else if (self$type == "ipc") {
diff --git a/r/R/dataset-write.R b/r/R/dataset-write.R
index d7c73908e7..09b3ebdbe6 100644
--- a/r/R/dataset-write.R
+++ b/r/R/dataset-write.R
@@ -136,41 +136,88 @@ write_dataset <- function(dataset,
   if (inherits(dataset, "arrow_dplyr_query")) {
 # partitioning vars need to be in the `select` schema
 dataset <- ensure_group_vars(dataset)
-  } else if (inherits(dataset, "grouped_df")) {
-force(partitioning)
-# Drop the grouping metadata before writing; we've already consumed it
-# now to construct `partitioning` and don't want it in the metadata$r
-dataset <- dplyr::ungroup(dataset)
+  } else {
+if (inherits(dataset, "grouped_df")) {
+  force(partitioning)
+ 

[arrow] branch master updated: ARROW-16201: [R] SafeCallIntoR on 3.4

2022-04-14 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 63d2a9c856 ARROW-16201: [R] SafeCallIntoR on 3.4
63d2a9c856 is described below

commit 63d2a9c856969a2c05e12ae8857a135bceaf45c1
Author: Jonathan Keane 
AuthorDate: Thu Apr 14 15:28:44 2022 -0500

ARROW-16201: [R] SafeCallIntoR on 3.4

Disabling the tests for now, 3.4 will no longer be in our support window 
shortly with the release of 4.2 also skip a number of tests that failed because 
of `tzone` being non-present on 3.4.

Closes #12887 from jonkeane/ARROW-16201

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/tests/testthat/test-dplyr-funcs-datetime.R | 2 ++
 r/tests/testthat/test-dplyr-funcs-type.R | 2 ++
 r/tests/testthat/test-safe-call-into-r.R | 1 +
 3 files changed, 5 insertions(+)

diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R 
b/r/tests/testthat/test-dplyr-funcs-datetime.R
index 79b922f6e2..fc030779ec 100644
--- a/r/tests/testthat/test-dplyr-funcs-datetime.R
+++ b/r/tests/testthat/test-dplyr-funcs-datetime.R
@@ -16,6 +16,8 @@
 # under the License.
 
 skip_if(on_old_windows())
+# In 3.4 the lack of tzone attribute causes spurious failures
+skip_if_r_version("3.4.4")
 
 library(lubridate, warn.conflicts = FALSE)
 library(dplyr, warn.conflicts = FALSE)
diff --git a/r/tests/testthat/test-dplyr-funcs-type.R 
b/r/tests/testthat/test-dplyr-funcs-type.R
index 6c9d9ac07a..aa6667420c 100644
--- a/r/tests/testthat/test-dplyr-funcs-type.R
+++ b/r/tests/testthat/test-dplyr-funcs-type.R
@@ -877,6 +877,8 @@ test_that("as.Date() converts successfully from date, 
timestamp, integer, char a
 
 test_that("format date/time", {
   skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+  # In 3.4 the lack of tzone attribute causes spurious failures
+  skip_if_r_version("3.4.4")
 
   times <- tibble(
 datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = 
"Pacific/Marquesas"), NA),
diff --git a/r/tests/testthat/test-safe-call-into-r.R 
b/r/tests/testthat/test-safe-call-into-r.R
index e9438de58b..55cb68abdd 100644
--- a/r/tests/testthat/test-safe-call-into-r.R
+++ b/r/tests/testthat/test-safe-call-into-r.R
@@ -46,6 +46,7 @@ test_that("SafeCallIntoR works within RunWithCapturedR", {
 })
 
 test_that("SafeCallIntoR errors from the non-R thread", {
+  skip_if_r_version("3.4.4")
   skip_on_cran()
 
   expect_error(



[arrow] branch master updated: MINOR: [R] Add Dewey + Dragoș as authors

2022-04-14 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 76e4f53679 MINOR: [R] Add Dewey + Dragoș as authors
76e4f53679 is described below

commit 76e4f53679c5b4bbc1b26b3dd181ec990f7b9223
Author: Jonathan Keane 
AuthorDate: Thu Apr 14 12:49:31 2022 -0500

MINOR: [R] Add Dewey + Dragoș as authors

also, alphabetize the `aut` group by last name

Closes #12889 from jonkeane/add-authors

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/DESCRIPTION | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index a5fb1ee9a4..46a3eefb68 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -5,8 +5,10 @@ Authors@R: c(
 person("Neal", "Richardson", email = "n...@ursalabs.org", role = c("aut", 
"cre")),
 person("Ian", "Cook", email = "ianmc...@gmail.com", role = c("aut")),
 person("Nic", "Crane", email = "thisis...@gmail.com", role = c("aut")),
-person("Jonathan", "Keane", email = "jke...@gmail.com", role = c("aut")),
+person("Dewey", "Dunnington", role = c("aut"), email = 
"de...@fishandwhistle.net", comment = c(ORCID = "-0002-9415-4582")),
 person("Romain", "Fran\u00e7ois", email = "rom...@rstudio.com", role = 
c("aut"), comment = c(ORCID = "-0002-2444-4226")),
+person("Jonathan", "Keane", email = "jke...@gmail.com", role = c("aut")),
+person("Drago\u0219", "Moldovan-Gr\u00fcnfeld", email = 
"dragos.m...@gmail.com", role = c("aut")),
 person("Jeroen", "Ooms", email = "jer...@berkeley.edu", role = c("aut")),
 person("Javier", "Luraschi", email = "jav...@rstudio.com", role = 
c("ctb")),
 person("Karl", "Dunkle Werner", email = "kar...@users.noreply.github.com", 
role = c("ctb"), comment = c(ORCID = "-0003-0523-7309")),



[arrow] branch master updated: ARROW-14168: [R] Warn only once about arrow function differences

2022-04-13 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d24ded7f7 ARROW-14168: [R] Warn only once about arrow function 
differences
9d24ded7f7 is described below

commit 9d24ded7f7d58717c9d78308b0c59ab7a9636006
Author: Edward Visel <1693477+alistair...@users.noreply.github.com>
AuthorDate: Wed Apr 13 18:26:37 2022 -0500

ARROW-14168: [R] Warn only once about arrow function differences

Addresses [ARROW-14168](https://issues.apache.org/jira/browse/ARROW-14168) 
by changing `median()` and `quantile()` to warn once, and adjusts the tests 
accordingly.

Closes #12867 from alistaire47/feat/fun-diff-warn-once

Lead-authored-by: Edward Visel 
<1693477+alistair...@users.noreply.github.com>
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/NEWS.md   |   1 +
 r/R/dplyr-summarize.R   |  10 +-
 r/tests/testthat/helper-arrow.R |  14 +++
 r/tests/testthat/test-dplyr-summarize.R | 205 +---
 4 files changed, 132 insertions(+), 98 deletions(-)

diff --git a/r/NEWS.md b/r/NEWS.md
index 1a1f198e0f..2e6a993507 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -27,6 +27,7 @@
 * date-time functionality:
   * Added `difftime` and `as.difftime()` 
   * Added `as.Date()` to convert to date
+* `median()` and `quantile()` will warn once about approximate calculations 
regardless of interactivity.
 
 # arrow 7.0.0
 
diff --git a/r/R/dplyr-summarize.R b/r/R/dplyr-summarize.R
index d8e6c46d92..6484d56866 100644
--- a/r/R/dplyr-summarize.R
+++ b/r/R/dplyr-summarize.R
@@ -106,8 +106,9 @@ register_bindings_aggregate <- function() {
 # this warning (ARROW-14021)
 warn(
   "quantile() currently returns an approximate quantile in Arrow",
-  .frequency = ifelse(is_interactive(), "once", "always"),
-  .frequency_id = "arrow.quantile.approximate"
+  .frequency = "once",
+  .frequency_id = "arrow.quantile.approximate",
+  class = "arrow.quantile.approximate"
 )
 list(
   fun = "tdigest",
@@ -120,8 +121,9 @@ register_bindings_aggregate <- function() {
 # this warning (ARROW-14021)
 warn(
   "median() currently returns an approximate median in Arrow",
-  .frequency = ifelse(is_interactive(), "once", "always"),
-  .frequency_id = "arrow.median.approximate"
+  .frequency = "once",
+  .frequency_id = "arrow.median.approximate",
+  class = "arrow.median.approximate"
 )
 list(
   fun = "approximate_median",
diff --git a/r/tests/testthat/helper-arrow.R b/r/tests/testthat/helper-arrow.R
index 545f2d0440..873bb55712 100644
--- a/r/tests/testthat/helper-arrow.R
+++ b/r/tests/testthat/helper-arrow.R
@@ -56,6 +56,20 @@ test_that <- function(what, code) {
   })
 }
 
+# backport of 4.0.0 implementation
+if (getRversion() < "4.0.0") {
+  suppressWarnings <- function(expr, classes = "warning") {
+withCallingHandlers(
+  expr,
+  warning = function(w) {
+if (inherits(w, classes)) {
+  invokeRestart("muffleWarning")
+}
+  }
+)
+  }
+}
+
 # Wrapper to run tests that only touch R code even when the C++ library isn't
 # available (so that at least some tests are run on those platforms)
 r_only <- function(code) {
diff --git a/r/tests/testthat/test-dplyr-summarize.R 
b/r/tests/testthat/test-dplyr-summarize.R
index efadb2722d..73e3312ee0 100644
--- a/r/tests/testthat/test-dplyr-summarize.R
+++ b/r/tests/testthat/test-dplyr-summarize.R
@@ -17,7 +17,13 @@
 
 skip_if(on_old_windows())
 
-withr::local_options(list(arrow.summarise.sort = TRUE))
+withr::local_options(list(
+  arrow.summarise.sort = TRUE,
+  rlib_warning_verbosity = "verbose",
+  # This prevents the warning in `summarize()` about having grouped output 
without
+  # also specifying what to do with `.groups`
+  dplyr.summarise.inform = FALSE
+))
 
 library(dplyr, warn.conflicts = FALSE)
 library(stringr)
@@ -296,52 +302,56 @@ test_that("median()", {
   # output of type float64. The calls to median(int, ...) in the tests below
   # are enclosed in as.double() to work around this known difference.
 
-  # Use old testthat behavior here so we don't have to assert the same warning
-  # over and over
-  local_edition(2)
-
   # with groups
-  compare_dplyr_binding(
-.input %>%
-  group_by(some_grouping) %>%
-  summarize(
-med_dbl = median(dbl),
-med_int = as.double(median(int)),
-med_dbl_narmf = median(dbl, FALSE),
-med_int_narmf = as.double(median(int, na.rm = FALSE)),
-med

[arrow] branch master updated: ARROW-16165: [CI][Archery] Fix nightly query to crossbow to send reports

2022-04-12 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f5936f4bc ARROW-16165: [CI][Archery] Fix nightly query to crossbow to 
send reports
8f5936f4bc is described below

commit 8f5936f4bc5b02b3b3516831bbde23b55c213249
Author: Raúl Cumplido 
AuthorDate: Tue Apr 12 09:59:09 2022 -0500

ARROW-16165: [CI][Archery] Fix nightly query to crossbow to send reports

This PR fixes the current issue with our nightly reports as seen here: 
https://github.com/ursacomputing/crossbow/runs/5840285120?check_suite_focus=true)

The issue could be reproduced using the prefix that crossbow reports uses:
```
job_prefix=nightly-${{ inputs.report_type }}-$(date -I)
job_id=$(archery crossbow latest-prefix ${job_prefix})
```
Before the fix, when using the following query:
```
$ archery crossbow latest-prefix --no-fetch nightly-packaging-2022-04-10
Traceback (most recent call last):
  File "/home/raulcd/open_source/pyarrow-dev/bin/archery", line 33, in 

sys.exit(load_entry_point('archery', 'console_scripts', 'archery')())
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 1130, in __call__
return self.main(*args, **kwargs)
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 1055, in main
rv = self.invoke(ctx)
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/core.py",
 line 760, in invoke
return __callback(*args, **kwargs)
  File 
"/home/raulcd/open_source/pyarrow-dev/lib/python3.10/site-packages/click/decorators.py",
 line 38, in new_func
return f(get_current_context().obj, *args, **kwargs)
  File 
"/home/raulcd/open_source/arrow/dev/archery/archery/crossbow/cli.py", line 237, 
in latest_prefix
latest = queue.latest_for_prefix(prefix)
  File 
"/home/raulcd/open_source/arrow/dev/archery/archery/crossbow/core.py", line 
568, in latest_for_prefix
latest_id += "-0"
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
```

After the fix:
```
$ archery crossbow latest-prefix --no-fetch nightly-packaging-2022-04-10
nightly-packaging-2022-04-10-0
$ archery crossbow latest-prefix --no-fetch nightly-packaging
nightly-packaging-2022-04-11-0
```

Closes #12862 from raulcd/ARROW-16165

Authored-by: Raúl Cumplido 
Signed-off-by: Jonathan Keane 
---
 dev/archery/archery/crossbow/core.py|  9 -
 dev/archery/archery/crossbow/tests/test_core.py | 26 -
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/dev/archery/archery/crossbow/core.py 
b/dev/archery/archery/crossbow/core.py
index c41582ad40..1ad4763e29 100644
--- a/dev/archery/archery/crossbow/core.py
+++ b/dev/archery/archery/crossbow/core.py
@@ -542,6 +542,12 @@ class Queue(Repo):
 latest = -1
 return latest
 
+def _prefix_contains_date(self, prefix):
+prefix_date_pattern = re.compile(r'[\w\/-]*-(\d+)-(\d+)-(\d+)')
+match_prefix = prefix_date_pattern.match(prefix)
+if match_prefix:
+return match_prefix.group(0)[-10:]
+
 def _latest_prefix_date(self, prefix):
 pattern = re.compile(r'[\w\/-]*{}-(\d+)-(\d+)-(\d+)'.format(prefix))
 matches = list(filter(None, map(pattern.match, self.repo.branches)))
@@ -559,7 +565,8 @@ class Queue(Repo):
 return '{}-{}'.format(prefix, latest_id + 1)
 
 def latest_for_prefix(self, prefix):
-if prefix.startswith("nightly"):
+prefix_date = self._prefix_contains_date(prefix)
+if prefix.startswith("nightly") and not prefix_date:
 latest_id = self._latest_prefix_date(prefix)
 if not latest_id:
 raise RuntimeError(
diff --git a/dev/archery/archery/crossbow/tests/test_core.py 
b/dev/archery/archery/crossbow/tests/test_core.py
index 847aae2240..3d538b89b2 100644
--- a/dev/archery/archery/crossbow/tests/test_core.py
+++ b/dev/archery/archery/crossbow/tests/test_core.py
@@ -16,9 +16,10 @@
 # unde

[arrow] branch master updated: ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and `decimal_date()`

2022-04-11 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 80bba5cbde ARROW-14810 [R] Implement bindings for lubridate's 
`date_decimal()` and `decimal_date()`
80bba5cbde is described below

commit 80bba5cbdef77e809a7b9bfec36eb5d6a61f0b5d
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Mon Apr 11 13:16:13 2022 -0500

ARROW-14810 [R] Implement bindings for lubridate's `date_decimal()` and 
`decimal_date()`

This would allow the following operations:

``` r
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(arrow, warn.conflicts = FALSE)

test_df <- tibble(
  a = c(2007.38998954347, 1970.77732069883, 2020.96061799722,
2009.43465948477, 1975.71251467871, NA),
  b = as.POSIXct(
c("2007-05-23 08:18:30", "1970-10-11 17:19:45", "2020-12-17 14:04:06",
  "2009-06-08 15:37:01", "1975-09-18 01:37:42", NA)
  )
)

test_df %>%
  mutate(
decimal_date_from_date = decimal_date(b),
date_from_decimal = date_decimal(a)
  )
#> # A tibble: 6 × 4
#>   a b   decimal_date_from_date date_from_decimal
#>
#> 1 2007. 2007-05-23 08:18:30  2007. 2007-05-23 08:18:30
#> 2 1971. 1970-10-11 17:19:45  1971. 1970-10-11 17:19:45
#> 3 2021. 2020-12-17 14:04:06  2021. 2020-12-17 14:04:06
#> 4 2009. 2009-06-08 15:37:01  2009. 2009-06-08 15:37:01
#> 5 1976. 1975-09-18 01:37:42  1976. 1975-09-18 01:37:42
#> 6   NA  NA NA  NA

test_df %>%
  arrow_table() %>%
  mutate(
decimal_date_from_date = decimal_date(b),
date_from_decimal = date_decimal(a)
  ) %>%
  collect()
#> # A tibble: 6 × 4
#>   a b   decimal_date_from_date date_from_decimal
#>
#> 1 2007. 2007-05-23 08:18:30  2007. 2007-05-23 08:18:30
#> 2 1971. 1970-10-11 17:19:45  1971. 1970-10-11 17:19:45
#> 3 2021. 2020-12-17 14:04:06  2021. 2020-12-17 14:04:06
#> 4 2009. 2009-06-08 15:37:01  2009. 2009-06-08 15:37:01
#> 5 1976. 1975-09-18 01:37:42  1976. 1975-09-18 01:37:42
#> 6   NA  NA NA  NA
```

Created on 2022-03-28 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)

Closes #12707 from dragosmg/decimal_dates

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 r/NEWS.md|  7 +++--
 r/R/dplyr-funcs-datetime.R   | 42 
 r/tests/testthat/test-dplyr-funcs-datetime.R | 31 +++-
 3 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/r/NEWS.md b/r/NEWS.md
index 0a7d30d2a3..1a1f198e0f 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -22,10 +22,11 @@
 * `read_csv_arrow()`'s readr-style type `T` is now mapped to `timestamp(unit = 
"ns")` instead of `timestamp(unit = "s")`.
 * `lubridate`:
   * component extraction functions: `tz()` (timezone), `semester()` 
(semester), `dst()` (daylight savings time indicator), `date()` (extract date), 
`epiyear()` (epiyear), improvements to `month()`, which now works with integer 
inputs.
-  * `make_date()` & `make_datetime()` + `ISOdatetime()` & `ISOdate()` to 
create date-times from numeric representations. 
+  * Added `make_date()` & `make_datetime()` + `ISOdatetime()` & `ISOdate()` to 
create date-times from numeric representations. 
+  * Added `decimal_date()` and `date_decimal()`
 * date-time functionality:
-  * `difftime` and `as.difftime()` 
-  * `as.Date()` to convert to date
+  * Added `difftime` and `as.difftime()` 
+  * Added `as.Date()` to convert to date
 
 # arrow 7.0.0
 
diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R
index 754d02a436..1ca485f56e 100644
--- a/r/R/dplyr-funcs-datetime.R
+++ b/r/R/dplyr-funcs-datetime.R
@@ -270,6 +270,20 @@ register_bindings_duration <- function() {
   time2 <- build_expr("cast", time2, options = cast_options(to_type = 
timestamp(timezone = "UTC")))
 }
 
+# if time1 or time2 are timestamps they cannot be expressed in "s" /seconds
+# otherwise they cannot be added subtracted with durations
+# TODO delete the casting to "us" once
+# https://issues.apache.org/jira/browse/ARROW-16060 is solved
+if (inherits(time1, "Expression&quo

[arrow] branch master updated: ARROW-14442: [R] fix behaviour when converting timestamps with "" as tzone

2022-04-11 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 633687c1e6 ARROW-14442: [R] fix behaviour when converting timestamps 
with "" as tzone
633687c1e6 is described below

commit 633687c1e6f940c78986af206a4bb2a478f25906
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Mon Apr 11 12:15:04 2022 -0500

ARROW-14442: [R] fix behaviour when converting timestamps with "" as tzone

Closes #12240 from dragosmg/timestampts_missing_timezone

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 r/src/type_infer.cpp |  6 --
 r/tests/testthat/test-Array.R| 27 ---
 r/tests/testthat/test-dplyr-funcs-datetime.R |  4 
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/r/src/type_infer.cpp b/r/src/type_infer.cpp
index 2568757aa2..75b1e85c42 100644
--- a/r/src/type_infer.cpp
+++ b/r/src/type_infer.cpp
@@ -72,7 +72,8 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   } else if (Rf_inherits(x, "POSIXct")) {
 auto tzone_sexp = Rf_getAttrib(x, symbols::tzone);
 if (Rf_isNull(tzone_sexp)) {
-  return timestamp(TimeUnit::MICRO);
+  auto systzone_sexp = cpp11::package("base")["Sys.timezone"];
+  return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(systzone_sexp(), 0)));
 } else {
   return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(tzone_sexp, 0)));
 }
@@ -88,7 +89,8 @@ std::shared_ptr 
InferArrowTypeFromVector(SEXP x) {
   if (Rf_inherits(x, "POSIXct")) {
 auto tzone_sexp = Rf_getAttrib(x, symbols::tzone);
 if (Rf_isNull(tzone_sexp)) {
-  return timestamp(TimeUnit::MICRO);
+  auto systzone_sexp = cpp11::package("base")["Sys.timezone"];
+  return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(systzone_sexp(), 0)));
 } else {
   return timestamp(TimeUnit::MICRO, CHAR(STRING_ELT(tzone_sexp, 0)));
 }
diff --git a/r/tests/testthat/test-Array.R b/r/tests/testthat/test-Array.R
index 15d6d79247..2f75efb3d6 100644
--- a/r/tests/testthat/test-Array.R
+++ b/r/tests/testthat/test-Array.R
@@ -260,11 +260,11 @@ test_that("array supports POSIXct (ARROW-3340)", {
   expect_array_roundtrip(times2, timestamp("us", "US/Eastern"))
 })
 
-test_that("array supports POSIXct without timezone", {
-  # Make sure timezone is not set
+test_that("array uses local timezone for POSIXct without timezone", {
   withr::with_envvar(c(TZ = ""), {
 times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 
1:10
-expect_array_roundtrip(times, timestamp("us", ""))
+expect_equal(attr(times, "tzone"), NULL)
+expect_array_roundtrip(times, timestamp("us", Sys.timezone()))
 
 # Also test the INTSXP code path
 skip("Ingest_POSIXct only implemented for REALSXP")
@@ -272,6 +272,27 @@ test_that("array supports POSIXct without timezone", {
 attributes(times_int) <- attributes(times)
 expect_array_roundtrip(times_int, timestamp("us", ""))
   })
+
+  # If there is a timezone set, we record that
+  withr::with_timezone("Pacific/Marquesas", {
+times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 
1:10
+expect_equal(attr(times, "tzone"), "Pacific/Marquesas")
+expect_array_roundtrip(times, timestamp("us", "Pacific/Marquesas"))
+
+times_with_tz <- strptime(
+  "2019-02-03 12:34:56",
+  format = "%Y-%m-%d %H:%M:%S",
+  tz = "Asia/Katmandu") + 1:10
+expect_equal(attr(times, "tzone"), "Asia/Katmandu")
+expect_array_roundtrip(times, timestamp("us", "Asia/Katmandu"))
+  })
+
+  # and although the TZ is NULL in R, we set it to the Sys.timezone()
+  withr::with_timezone(NA, {
+times <- strptime("2019-02-03 12:34:56", format = "%Y-%m-%d %H:%M:%S") + 
1:10
+expect_equal(attr(times, "tzone"), NULL)
+expect_array_roundtrip(times, timestamp("us", Sys.timezone()))
+  })
 })
 
 test_that("Timezone handling in Arrow roundtrip (ARROW-3543)", {
diff --git a/r/tests/testthat/test-dplyr-funcs-datetime.R 
b/r/tests/testthat/test-dplyr-funcs-datetime.R
index 16e4958f1c..c901742f65 100644
--- a/r/tests/testthat/test-dplyr-funcs-datetime.R
+++ b/r/tests/testthat/test-dplyr-funcs-datetime.R
@@ -693,7 +693,6 @@ test_that("extract yday from date", {
 })
 
 test_that("leap_year mirror lubridate", {
-
   compare_dplyr_binding(
 .input %>%
   mutate(x = leap_year(date)) %>%
@@ -721,7 +720,6 @@ test_that("leap_year mirror lubridate", {
   ))
 )
   )
-
 })
 
 test_that("am/pm mirror lubridate", {
@@ -741,10 +739,8 @@ test_that("am/pm mirror lubridate", {
 ),
 format = "%Y-%m-%d %H:%M:%S"
   )
-
 )
   )
-
 })
 
 test_that("extract tz", {



[arrow] branch master updated: ARROW-16156: [R] Clarify warning message for features not turned on in .onAttach()

2022-04-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 961ec771b9 ARROW-16156: [R] Clarify warning message for features not 
turned on in .onAttach()
961ec771b9 is described below

commit 961ec771b9bb84b71be3af398e7b3df3e10f291b
Author: Dewey Dunnington 
AuthorDate: Fri Apr 8 17:22:16 2022 -0500

ARROW-16156: [R] Clarify warning message for features not turned on in 
.onAttach()

After ARROW-15818 (#12564) we get an extra message on package load because 
"engine" was added to `arrow_info()$capabilities` and few if any users will 
have this turned on for at least the next release:

```r
library(arrow)
#> See arrow_info() for available features
```

This PR adds "engine" to the list of features we don't message users about 
and clarifies the message so that it's more clear why it's being shown:

```r
library(arrow)
#> Some features of Arrow C++ are turned off. Run `arrow_info()` for more 
information.
```

Closes #12842 from paleolimbot/r-onattach

Lead-authored-by: Dewey Dunnington 
Co-authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 r/R/arrow-package.R | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 896363a478..3c810bb8f2 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -107,7 +107,12 @@
   #
   # Let's print a message if some are off
   if (some_features_are_off(features)) {
-packageStartupMessage("See arrow_info() for available features")
+packageStartupMessage(
+  paste(
+"Some features are not enabled in this build of Arrow.",
+"Run `arrow_info()` for more information."
+  )
+)
   }
 })
   }
@@ -264,7 +269,7 @@ arrow_info <- function() {
 some_features_are_off <- function(features) {
   # `features` is a named logical vector (as in arrow_info()$capabilities)
   # Let's exclude some less relevant ones
-  blocklist <- c("lzo", "bz2", "brotli")
+  blocklist <- c("lzo", "bz2", "brotli", "engine")
   # Return TRUE if any of the other features are FALSE
   !all(features[setdiff(names(features), blocklist)])
 }



[arrow] branch master updated: MINOR: [R] Fix compiler warning/CMD check NOTE when compiling with ARROW_R_WITH_ENGINE

2022-04-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d197ad31c3 MINOR: [R] Fix compiler warning/CMD check NOTE when 
compiling with ARROW_R_WITH_ENGINE
d197ad31c3 is described below

commit d197ad31c3d7c16ecee74cb76a71ce397e905b3b
Author: Dewey Dunnington 
AuthorDate: Fri Apr 8 12:24:17 2022 -0500

MINOR: [R] Fix compiler warning/CMD check NOTE when compiling with 
ARROW_R_WITH_ENGINE

After ARROW-16033 (#12721) we get this compiler warning when compiling with 
`ARROW_R_WITH_ENGINE`:

```
   compute-exec.cpp:304:17: warning: 'Init' overrides a member function but 
is not marked 'override' [-Winconsistent-missing-override]
 arrow::Status Init(const std::shared_ptr& schema) {
   ^
   
/Users/deweydunnington/.r-arrow-dev-build/dist/include/arrow/compute/exec/options.h:153:18:
 note: overridden virtual function is here
 virtual Status Init(const std::shared_ptr& schema) = 0;
^
   1 warning generated.
```

This PR just adds the requisite `override`.

Closes #12823 from paleolimbot/r-minor-override

Authored-by: Dewey Dunnington 
Signed-off-by: Jonathan Keane 
---
 r/src/compute-exec.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/r/src/compute-exec.cpp b/r/src/compute-exec.cpp
index a1a679144d..e7d8df55bb 100644
--- a/r/src/compute-exec.cpp
+++ b/r/src/compute-exec.cpp
@@ -301,7 +301,7 @@ class AccumulatingConsumer : public 
compute::SinkNodeConsumer {
  public:
   const std::vector>& batches() { return 
batches_; }
 
-  arrow::Status Init(const std::shared_ptr& schema) {
+  arrow::Status Init(const std::shared_ptr& schema) override {
 schema_ = schema;
 return arrow::Status::OK();
   }



[arrow] branch master updated: ARROW-15471: [R] ExtensionType support in R

2022-04-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 489aada557 ARROW-15471: [R] ExtensionType support in R
489aada557 is described below

commit 489aada557f267f4b9745b039034e9f5b0e1f485
Author: Dewey Dunnington 
AuthorDate: Fri Apr 8 10:52:12 2022 -0500

ARROW-15471: [R] ExtensionType support in R

This PR implements extension type support and registration in the R 
bindings (as has been possible in the Python bindings for some time). The 
details still need to be worked out, but we at least have a working pattern:

``` r
library(arrow, warn.conflicts = FALSE)
library(R6)

SomeExtensionTypeSubclass <- R6Class(
  "SomeExtensionTypeSubclass", inherit = arrow:::ExtensionType,
  public = list(
some_custom_method = function() {
  private$some_custom_field
},

.Deserialize = function(storage_type, extension_name, 
extension_metadata) {
  private$some_custom_field <- head(extension_metadata, 5)
}
  ),
  private = list(
some_custom_field = NULL
  )
)

SomeExtensionArraySubclass <- R6Class(
  "SomeExtensionArraySubclass", inherit = arrow:::ExtensionArray,
  public = list(
some_custom_method = function() {
  self$type$some_custom_method()
}
  )
)

type <- arrow:::MakeExtensionType(
  int32(),
  "some_extension_subclass",
  charToRaw("some custom metadata"),
  type_class = SomeExtensionTypeSubclass,
  array_class = SomeExtensionArraySubclass
)

arrow:::RegisterExtensionType(type)

# survives the C API round trip
ptr_type <- arrow:::allocate_arrow_schema()
type$export_to_c(ptr_type)
type2 <- arrow:::DataType$import_from_c(ptr_type)

type2
#> SomeExtensionTypeSubclass
#> SomeExtensionTypeSubclass 
type2$some_custom_method()
#> [1] 73 6f 6d 65 20

(array <- type$WrapArray(Array$create(1:10)))
#> SomeExtensionArraySubclass
#> >
#> [
#>   1,
#>   2,
#>   3,
#>   4,
#>   5,
#>   6,
#>   7,
#>   8,
#>   9,
#>   10
#> ]
array$some_custom_method()
#> [1] 73 6f 6d 65 20

ptr_array <- arrow:::allocate_arrow_array()
array$export_to_c(ptr_array, ptr_type)
(array2 <- Array$import_from_c(ptr_array, ptr_type))
#> SomeExtensionArraySubclass
#> >
#> [
#>   1,
#>   2,
#>   3,
#>   4,
#>   5,
#>   6,
#>   7,
#>   8,
#>   9,
#>   10
#> ]

arrow:::delete_arrow_schema(ptr_type)
arrow:::delete_arrow_array(ptr_array)
```

Created on 2022-02-18 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)

Closes #12467 from paleolimbot/r-extension-type

Authored-by: Dewey Dunnington 
Signed-off-by: Jonathan Keane 
---
 r/DESCRIPTION|   1 +
 r/NAMESPACE  |   9 +
 r/R/arrow-package.R  |   5 +
 r/R/arrowExports.R   |  36 +++
 r/R/extension.R  | 545 +++
 r/_pkgdown.yml   |   5 +
 r/man/ExtensionArray.Rd  |  23 ++
 r/man/ExtensionType.Rd   |  48 +++
 r/man/new_extension_type.Rd  | 167 +++
 r/man/vctrs_extension_array.Rd   |  50 
 r/src/array.cpp  |   2 +
 r/src/array_to_vector.cpp|  33 +++
 r/src/arrowExports.cpp   | 150 ++
 r/src/datatype.cpp   |   2 +
 r/src/extension-impl.cpp | 198 +
 r/src/extension.h|  75 +
 r/tests/testthat/_snaps/extension.md |  10 +
 r/tests/testthat/test-extension.R| 345 ++
 18 files changed, 1704 insertions(+)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 36a55c05b2..a5fb1ee9a4 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -108,6 +108,7 @@ Collate:
 'table.R'
 'dplyr.R'
 'duckdb.R'
+'extension.R'
 'feather.R'
 'field.R'
 'filesystem.R'
diff --git a/r/NAMESPACE b/r/NAMESPACE
index f32e73f537..da43a3f511 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -134,6 +134,8 @@ export(DictionaryArray)
 export(DirectoryPartitioning)
 export(DirectoryPartitioningFactory)
 export(Expression)
+export(ExtensionArray)
+export(ExtensionType)
 export(FeatherReader)
 export(Field)
 export(FileFormat)
@@ -267,6 +269,8 @@ export(match_arrow)
 export(matches)
 export(mmap_create)
 export(mmap_open)
+export(new_extension_array)
+export(new_extension_type)
 export(null)
 expo

[arrow] branch master updated: ARROW-16038: [R] different behavior from dplyr when mutate's `.keep` option is set

2022-04-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new b8299436c8 ARROW-16038: [R] different behavior from dplyr when 
mutate's `.keep` option is set
b8299436c8 is described below

commit b8299436c8b1a2d7cd3d6e019a2b750893a3af87
Author: SAm Albers 
AuthorDate: Thu Apr 7 16:07:34 2022 -0500

ARROW-16038: [R] different behavior from dplyr when mutate's `.keep` option 
is set

This PR does two things to match some dplyr behaviour around column order:

1) Mimics dplyr implementation of `mutate(..., .keep = "none")` to append 
new columns after the existing columns (if suggested) as 
[per](https://github.com/tidyverse/dplyr/issues/6086)

2) As per this 
[discussion](https://github.com/tidyverse/dplyr/issues/6086), this required a 
bespoke approach to `transmute` as it not simply a wrapper for `mutate(..., 
.keep = "none")`. This cascades into needing to catch a couple edge cases.

I have also added some tests which will test for this behaviour.

Closes #12818 from boshek/mutate-keep

Authored-by: SAm Albers 
Signed-off-by: Jonathan Keane 
---
 r/NAMESPACE  |  1 +
 r/R/arrow-package.R  |  2 +-
 r/R/dplyr-mutate.R   | 17 +++--
 r/tests/testthat/test-dplyr-mutate.R | 34 +-
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/r/NAMESPACE b/r/NAMESPACE
index 7cb89b0a53..f32e73f537 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -331,6 +331,7 @@ importFrom(rlang,":=")
 importFrom(rlang,.data)
 importFrom(rlang,abort)
 importFrom(rlang,as_function)
+importFrom(rlang,as_label)
 importFrom(rlang,as_quosure)
 importFrom(rlang,call2)
 importFrom(rlang,caller_env)
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 2fab03d08c..256bc7aefa 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -23,7 +23,7 @@
 #' @importFrom rlang eval_tidy new_data_mask syms env new_environment env_bind 
set_names exec
 #' @importFrom rlang is_bare_character quo_get_expr quo_get_env quo_set_expr 
.data seq2 is_interactive
 #' @importFrom rlang expr caller_env is_character quo_name is_quosure enexpr 
enexprs as_quosure
-#' @importFrom rlang is_list call2 is_empty as_function
+#' @importFrom rlang is_list call2 is_empty as_function as_label
 #' @importFrom tidyselect vars_pull vars_rename vars_select eval_select
 #' @useDynLib arrow, .registration = TRUE
 #' @keywords internal
diff --git a/r/R/dplyr-mutate.R b/r/R/dplyr-mutate.R
index 986f29cc1d..07802f8c83 100644
--- a/r/R/dplyr-mutate.R
+++ b/r/R/dplyr-mutate.R
@@ -94,7 +94,10 @@ mutate.arrow_dplyr_query <- function(.data,
 
   # Respect .keep
   if (.keep == "none") {
-.data$selected_columns <- .data$selected_columns[new_vars]
+## for consistency with dplyr, this appends new columns after existing 
columns
+## by specifying the order
+new_cols_last <- c(intersect(old_vars, new_vars), setdiff(new_vars, 
old_vars))
+.data$selected_columns <- .data$selected_columns[new_cols_last]
   } else if (.keep != "all") {
 # "used" or "unused"
 used_vars <- unlist(lapply(exprs, all.vars), use.names = FALSE)
@@ -112,7 +115,17 @@ mutate.Dataset <- mutate.ArrowTabular <- 
mutate.RecordBatchReader <- mutate.arro
 
 transmute.arrow_dplyr_query <- function(.data, ...) {
   dots <- check_transmute_args(...)
-  dplyr::mutate(.data, !!!dots, .keep = "none")
+  has_null <- map_lgl(dots, quo_is_null)
+  .data <- dplyr::mutate(.data, !!!dots, .keep = "none")
+  if (is_empty(dots) | any(has_null)) {
+return(.data)
+  }
+
+  ## keeping with: https://github.com/tidyverse/dplyr/issues/6086
+  cur_exprs <- map_chr(dots, as_label)
+  transmute_order <- names(cur_exprs)
+  transmute_order[!nzchar(transmute_order)] <- 
cur_exprs[!nzchar(transmute_order)]
+  dplyr::select(.data, all_of(transmute_order))
 }
 transmute.Dataset <- transmute.ArrowTabular <- transmute.RecordBatchReader <- 
transmute.arrow_dplyr_query
 
diff --git a/r/tests/testthat/test-dplyr-mutate.R 
b/r/tests/testthat/test-dplyr-mutate.R
index 61d9edac1e..a746335940 100644
--- a/r/tests/testthat/test-dplyr-mutate.R
+++ b/r/tests/testthat/test-dplyr-mutate.R
@@ -74,6 +74,16 @@ test_that("transmute", {
   )
 })
 
+test_that("transmute respect bespoke dplyr implementation", {
+  ## see: https://github.com/tidyverse/dplyr/issues/6086
+  compare_dplyr_binding(
+.input %>%
+  transmute(dbl, int = int + 6L) %>%
+  collect(),
+tbl
+  )
+})
+
 test_that("transmute() with NULL inputs", {
   compare_dplyr_binding(
 .input %>%
@@ -92,6 +102,20 @@ test_that

[arrow] branch master updated: ARROW-15841: [R] Implement SafeCallIntoR to safely call the R API from another thread

2022-04-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new e110eac71a ARROW-15841: [R] Implement SafeCallIntoR to safely call the 
R API from another thread
e110eac71a is described below

commit e110eac71aae63041a595fc1c8cc51960ba97f06
Author: Dewey Dunnington 
AuthorDate: Thu Apr 7 11:50:28 2022 -0500

ARROW-15841: [R] Implement SafeCallIntoR to safely call the R API from 
another thread

This is a very WIP draft that currently just sketches a few things related 
to calling into R from other threads. Some code to get started:

``` r
arrow:::TestSafeCallIntoR(
  list(
function() "string one",
function() "string two"
  )
)
#> [1] "string one" "string two"

arrow:::TestSafeCallIntoR(
  list(
function() stop("This is an error!")
  )
)
#> Error in (function () : This is an error!
```

Closes #12558 from paleolimbot/r-safe-call-into

Authored-by: Dewey Dunnington 
Signed-off-by: Jonathan Keane 
---
 r/R/arrow-package.R  |   5 ++
 r/R/arrowExports.R   |   8 ++
 r/src/arrowExports.cpp   |  33 +++
 r/src/safe-call-into-r-impl.cpp  |  89 +++
 r/src/safe-call-into-r.h | 145 +++
 r/tests/testthat/test-safe-call-into-r.R |  60 +
 6 files changed, 340 insertions(+)

diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 509382e5da..2fab03d08c 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -31,6 +31,11 @@
 
 #' @importFrom vctrs s3_register vec_size vec_cast vec_unique
 .onLoad <- function(...) {
+  if (arrow_available()) {
+# Make sure C++ knows on which thread it is safe to call the R API
+InitializeMainRThread()
+  }
+
   dplyr_methods <- paste0(
 "dplyr::",
 c(
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index f43ef730ca..5ef6312196 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -1732,6 +1732,14 @@ ipc___RecordBatchStreamWriter__Open <- function(stream, 
schema, use_legacy_forma
   .Call(`_arrow_ipc___RecordBatchStreamWriter__Open`, stream, schema, 
use_legacy_format, metadata_version)
 }
 
+InitializeMainRThread <- function() {
+  invisible(.Call(`_arrow_InitializeMainRThread`))
+}
+
+TestSafeCallIntoR <- function(r_fun_that_returns_a_string, opt) {
+  .Call(`_arrow_TestSafeCallIntoR`, r_fun_that_returns_a_string, opt)
+}
+
 Array__GetScalar <- function(x, i) {
   .Call(`_arrow_Array__GetScalar`, x, i)
 }
diff --git a/r/src/arrowExports.cpp b/r/src/arrowExports.cpp
index 45a883321d..0a29ed0872 100644
--- a/r/src/arrowExports.cpp
+++ b/r/src/arrowExports.cpp
@@ -6822,6 +6822,37 @@ extern "C" SEXP 
_arrow_ipc___RecordBatchStreamWriter__Open(SEXP stream_sexp, SEX
 }
 #endif
 
+// safe-call-into-r-impl.cpp
+#if defined(ARROW_R_WITH_ARROW)
+void InitializeMainRThread();
+extern "C" SEXP _arrow_InitializeMainRThread(){
+BEGIN_CPP11
+   InitializeMainRThread();
+   return R_NilValue;
+END_CPP11
+}
+#else
+extern "C" SEXP _arrow_InitializeMainRThread(){
+   Rf_error("Cannot call InitializeMainRThread(). See 
https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow 
C++ libraries. ");
+}
+#endif
+
+// safe-call-into-r-impl.cpp
+#if defined(ARROW_R_WITH_ARROW)
+std::string TestSafeCallIntoR(cpp11::function r_fun_that_returns_a_string, 
std::string opt);
+extern "C" SEXP _arrow_TestSafeCallIntoR(SEXP 
r_fun_that_returns_a_string_sexp, SEXP opt_sexp){
+BEGIN_CPP11
+   arrow::r::Input::type 
r_fun_that_returns_a_string(r_fun_that_returns_a_string_sexp);
+   arrow::r::Input::type opt(opt_sexp);
+   return cpp11::as_sexp(TestSafeCallIntoR(r_fun_that_returns_a_string, 
opt));
+END_CPP11
+}
+#else
+extern "C" SEXP _arrow_TestSafeCallIntoR(SEXP 
r_fun_that_returns_a_string_sexp, SEXP opt_sexp){
+   Rf_error("Cannot call TestSafeCallIntoR(). See 
https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow 
C++ libraries. ");
+}
+#endif
+
 // scalar.cpp
 #if defined(ARROW_R_WITH_ARROW)
 std::shared_ptr Array__GetScalar(const 
std::shared_ptr& x, int64_t i);
@@ -8146,6 +8177,8 @@ static const R_CallMethodDef CallEntries[] = {
{ "_arrow_ipc___RecordBatchWriter__Close", (DL_FUNC) 
&_arrow_ipc___RecordBatchWriter__Close, 1}, 
{ "_arrow_ipc___RecordBatchFileWriter__Open", (DL_FUNC) 
&_arrow_ipc___RecordBatchFileWriter__Open, 4}, 
{ "_arrow_ipc___RecordBatchStreamWriter__Open", (DL_FUNC) 
&_arrow_ipc___RecordBatchStreamW

[arrow] branch master updated (a1a255b -> a1f32fa)

2022-03-31 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from a1a255b  MINOR: [R] add some verbosity to homebrew tests
 add a1f32fa  MINOR: [R] Avoid {glue}'s whitespace trimming

No new revisions were added by this update.

Summary of changes:
 r/data-raw/codegen.R | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


[arrow] branch master updated (d4798ef -> a1a255b)

2022-03-31 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d4798ef  ARROW-16061: [R] [CI] Speed up windows 3.6 builds
 add a1a255b  MINOR: [R] add some verbosity to homebrew tests

No new revisions were added by this update.

Summary of changes:
 dev/tasks/r/github.macos.brew.yml | 1 +
 1 file changed, 1 insertion(+)


[arrow] branch master updated (4a90e39 -> d4798ef)

2022-03-31 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 4a90e39  ARROW-16078: Upgrade bundled zlib to 1.2.12
 add d4798ef  ARROW-16061: [R] [CI] Speed up windows 3.6 builds

No new revisions were added by this update.

Summary of changes:
 .github/workflows/r.yml | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)


[arrow] branch master updated (64560af -> ba04e7f)

2022-03-30 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 64560af  ARROW-16053: [C++][FlightRPC] Fix flaky test 
TestAuthHandler.FailUnauthenticatedCalls
 add ba04e7f  ARROW-15659 [R] strptime should return NA (not error) with 
format mismatch

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-datetime.R   |  2 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R | 42 
 2 files changed, 43 insertions(+), 1 deletion(-)


[arrow] branch master updated (6f9b07a -> 4d0436a)

2022-03-29 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6f9b07a  ARROW-15975: [C++] Document type traits and inline visitors
 add 4d0436a  ARROW-15818: [R] Implement initial Substrait consumer in the 
R bindings

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE  |  1 +
 r/R/arrow-package.R  |  9 
 r/R/arrowExports.R   | 12 +
 r/R/query-engine.R   | 13 +
 r/configure  |  8 +++
 r/configure.win  |  9 +++-
 r/data-raw/codegen.R |  2 +-
 r/man/arrow_available.Rd |  3 ++
 r/src/arrowExports.cpp   | 62 ++-
 r/src/compute-exec.cpp   | 98 
 r/tests/testthat/test-query-engine.R | 63 +++
 11 files changed, 276 insertions(+), 4 deletions(-)
 create mode 100644 r/tests/testthat/test-query-engine.R


[arrow] branch master updated (ad7380e -> b781710)

2022-03-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from ad7380e  ARROW-15857: [R] rhub/fedora-clang-devel fails to install 
'sass' (rmarkdown dependency)
 add b781710  ARROW-15947: [R] rename_with s3 method for arrow_dplyr_query

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE  |  1 +
 r/R/arrow-package.R  |  4 +--
 r/R/dplyr-select.R   |  7 +
 r/tests/testthat/test-dplyr-select.R | 54 ++--
 4 files changed, 61 insertions(+), 5 deletions(-)


[arrow] branch master updated (6a0770c -> ad7380e)

2022-03-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6a0770c  ARROW-15665: [C++] Fix error_is_null in strptime with invalid 
inputs
 add ad7380e  ARROW-15857: [R] rhub/fedora-clang-devel fails to install 
'sass' (rmarkdown dependency)

No new revisions were added by this update.

Summary of changes:
 ci/scripts/r_docker_configure.sh | 3 +++
 1 file changed, 3 insertions(+)


[arrow] branch master updated (919d113 -> f4dfd6c)

2022-03-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 919d113  ARROW-13564: [Dev] Check individual commit messages for 
"Co-authored-by:" tags when integrating a pull request
 add f4dfd6c  ARROW-13168: [C++][R] Enable runtime timezone database for 
Windows

No new revisions were added by this update.

Summary of changes:
 .github/workflows/cpp.yml  |  6 +++
 ci/appveyor-cpp-setup.bat  | 14 ++
 ..._cgo_python_test.sh => download_tz_database.sh} | 29 +++-
 .../arrow/compute/kernels/scalar_cast_string.cc|  7 ---
 cpp/src/arrow/compute/kernels/scalar_cast_test.cc  | 43 ++
 .../arrow/compute/kernels/scalar_temporal_test.cc  | 19 +---
 .../arrow/compute/kernels/scalar_temporal_unary.cc | 13 --
 cpp/src/arrow/config.cc| 28 
 cpp/src/arrow/config.h | 18 
 cpp/src/arrow/public_api_test.cc   | 44 +++
 cpp/src/arrow/testing/util.cc  | 20 +
 cpp/src/arrow/testing/util.h   |  8 
 docs/source/cpp/api/support.rst| 10 +
 docs/source/cpp/build_system.rst   | 23 ++
 docs/source/developers/cpp/windows.rst |  9 
 r/DESCRIPTION  |  1 +
 r/R/arrow-package.R|  8 
 r/R/arrowExports.R |  4 ++
 r/R/dplyr-funcs-datetime.R | 15 +--
 r/src/arrowExports.cpp | 17 
 r/src/config.cpp   | 13 ++
 r/tests/testthat/test-dplyr-funcs-datetime.R   | 51 +-
 22 files changed, 281 insertions(+), 119 deletions(-)
 copy ci/scripts/{go_cgo_python_test.sh => download_tz_database.sh} (65%)
 mode change 100755 => 100644


[arrow] branch master updated (6ab947b -> d327f69)

2022-03-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6ab947b  ARROW-15321: [Dev][Python] Also numpydoc-validate 
Cython-generated methods
 add d327f69  ARROW-15814: [R][DOCS] Improve documentation for cast()

No new revisions were added by this update.

Summary of changes:
 r/R/type.R | 22 ++
 r/man/data-type.Rd | 22 ++
 2 files changed, 44 insertions(+)


[arrow] branch master updated (bfa3bca -> 5bd4d8e)

2022-03-25 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from bfa3bca  ARROW-15313: [C++][Java][FlightRPC] Implement type info 
method to flight-sql
 add 5bd4d8e  ARROW-16007: [R] grepl bindings return FALSE for NA inputs

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-string.R   | 44 ---
 r/tests/testthat/test-dplyr-funcs-string.R | 58 ++
 2 files changed, 83 insertions(+), 19 deletions(-)


[arrow] branch master updated (a17137f -> e83ef42)

2022-03-24 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from a17137f  ARROW-15921: [Format][FlightRPC][C++][Java] Clarify 
interpretation of FlightEndpoint.locations
 add e83ef42  ARROW-15098 [R] Add binding for `lubridate::duration()` 
and/or `as.difftime()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|   1 +
 r/R/dplyr-funcs-datetime.R   |  64 +
 r/R/dplyr-funcs.R|   1 +
 r/tests/testthat/test-dplyr-funcs-datetime.R | 137 +++
 4 files changed, 203 insertions(+)


[arrow] branch master updated (ad2fb74 -> 5073d63)

2022-03-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from ad2fb74  ARROW-15960: [C++] Fix crash on adaptive int builder edge 
cases
 add 5073d63  MINOR: [R] Run the styler

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-datetime.R | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)


[arrow] branch master updated (b0d6e27 -> 5bd9943)

2022-03-21 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from b0d6e27  ARROW-15544: [Go][Parquet] Fix origin schema base64 decoding
 add 5bd9943  ARROW-15489: [R] Expand RecordBatchReader usability

No new revisions were added by this update.

Summary of changes:
 r/R/record-batch-reader.R   | 14 +-
 r/tests/testthat/test-record-batch-reader.R | 28 
 2 files changed, 41 insertions(+), 1 deletion(-)


[arrow] branch master updated (acda3c6 -> de02cfc)

2022-03-21 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from acda3c6  ARROW-14679: [R] [C++] Handle suffix argument in joins
 add de02cfc  ARROW-15802 [R] bindings for `lubridate::make_datetime()` and 
`lubridate::make_date()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|   1 +
 r/R/dplyr-funcs-datetime.R   |  46 
 r/tests/testthat/test-dplyr-funcs-datetime.R | 102 +++
 3 files changed, 149 insertions(+)


[arrow] branch master updated (70b8a82 -> acda3c6)

2022-03-21 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 70b8a82  ARROW-15919: [C++] Add function not commutative with 
timestamps & duration maths
 add acda3c6  ARROW-14679: [R] [C++] Handle suffix argument in joins

No new revisions were added by this update.

Summary of changes:
 r/R/arrowExports.R |  5 ++-
 r/R/dplyr-collect.R| 31 +--
 r/R/dplyr-join.R   |  4 +--
 r/R/query-engine.R | 21 +++--
 r/src/arrowExports.cpp | 12 +---
 r/src/compute-exec.cpp |  9 --
 r/tests/testthat/test-dplyr-join.R | 63 +-
 7 files changed, 120 insertions(+), 25 deletions(-)


[arrow] branch master updated (a7f91ec -> ae93d12)

2022-03-18 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from a7f91ec  ARROW-15296: [CI][GO] Add Go staticcheck linting to CI lint 
job
 add ae93d12  ARROW-15627: [R] Fix union dataset unify schema

No new revisions were added by this update.

Summary of changes:
 r/R/dataset.R   | 21 +++--
 r/tests/testthat/test-dataset.R | 51 +
 2 files changed, 65 insertions(+), 7 deletions(-)


[arrow] branch master updated (93ea682 -> 74200f5)

2022-03-18 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 93ea682  ARROW-15929: [R] io_thread_count is actually the CPU thread 
count
 add 74200f5  ARROW-15875: [R] Expose ReadMetadata for input streams

No new revisions were added by this update.

Summary of changes:
 r/R/arrowExports.R |  4 
 r/R/io.R   |  3 +++
 r/src/arrowExports.cpp | 26 +-
 r/src/io.cpp   | 24 
 r/tests/testthat/test-io.R | 12 
 r/tests/testthat/test-s3.R |  7 +++
 6 files changed, 71 insertions(+), 5 deletions(-)


[arrow] branch master updated (d459311 -> 93ea682)

2022-03-18 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d459311  MINOR: [Docs] Update architectural_overview.rst
 add 93ea682  ARROW-15929: [R] io_thread_count is actually the CPU thread 
count

No new revisions were added by this update.

Summary of changes:
 r/R/config.R   |  4 
 r/src/threadpool.cpp   |  5 ++--
 .../tests/testthat/test-config.R   | 28 +++---
 3 files changed, 26 insertions(+), 11 deletions(-)
 copy ci/scripts/install_ceph.sh => r/tests/testthat/test-config.R (52%)
 mode change 100755 => 100644


[arrow] branch master updated: MINOR: [R] [CI] Disable the DuckDB dev tests that are failing

2022-03-18 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e13c2d  MINOR: [R] [CI] Disable the DuckDB dev tests that are failing
8e13c2d is described below

commit 8e13c2ddcf589fb03bc239098f8ade329c283b50
Author: Jonathan Keane 
AuthorDate: Fri Mar 18 10:38:43 2022 -0500

MINOR: [R] [CI] Disable the DuckDB dev tests that are failing

This is being tracked at https://github.com/duckdb/duckdb/issues/3258 and 
we have a follow up to re-enable: 
https://issues.apache.org/jira/browse/ARROW-15970

Closes #12666 from jonkeane/disable-dev-duckdb

Authored-by: Jonathan Keane 
Signed-off-by: Jonathan Keane 
---
 dev/tasks/tasks.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml
index 0992aa1..2fd5de6 100644
--- a/dev/tasks/tasks.yml
+++ b/dev/tasks/tasks.yml
@@ -148,6 +148,8 @@ groups:
 - example-*
 - wheel-*
 - python-sdist
+# ARROW-15970 and duckdb/duckdb#3258
+- ~test-r-dev-duckdb
 
 tasks:
   # arbitrary_task_name:


[arrow] branch master updated: ARROW-14199 [R] bindings for format (where possible)

2022-03-11 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new dc2e0b2  ARROW-14199 [R] bindings for format (where possible)
dc2e0b2 is described below

commit dc2e0b2e44fdaa3d5ad0bb358ff8ce9db3bc7416
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Fri Mar 11 10:34:40 2022 -0600

ARROW-14199 [R] bindings for format (where possible)

Closes #12319 from dragosmg/format_bindings

Authored-by: Dragoș Moldovan-Grünfeld 
Signed-off-by: Jonathan Keane 
---
 r/R/dplyr-funcs-datetime.R   |  20 ++
 r/R/dplyr-funcs-type.R   |  18 ++
 r/tests/testthat/test-dplyr-funcs-type.R | 102 +++
 3 files changed, 140 insertions(+)

diff --git a/r/R/dplyr-funcs-datetime.R b/r/R/dplyr-funcs-datetime.R
index 8f5a768..ea25f62 100644
--- a/r/R/dplyr-funcs-datetime.R
+++ b/r/R/dplyr-funcs-datetime.R
@@ -189,3 +189,23 @@ register_bindings_datetime <- function() {
 build_expr("cast", x, options = list(to_type = date32()))
   })
 }
+
+binding_format_datetime <- function(x, format = "", tz = "", usetz = FALSE) {
+  if (usetz) {
+format <- paste(format, "%Z")
+  }
+
+  if (call_binding("is.POSIXct", x)) {
+# the casting part might not be required once
+# https://issues.apache.org/jira/browse/ARROW-14442 is solved
+# TODO revisit the steps below once the PR for that issue is merged
+if (tz == "" && x$type()$timezone() != "") {
+  tz <- x$type()$timezone()
+} else if (tz == "") {
+  tz <- Sys.timezone()
+}
+x <- build_expr("cast", x, options = cast_options(to_type = 
timestamp(x$type()$unit(), tz)))
+  }
+
+  build_expr("strftime", x, options = list(format = format, locale = 
Sys.getlocale("LC_TIME")))
+}
diff --git a/r/R/dplyr-funcs-type.R b/r/R/dplyr-funcs-type.R
index 7fd3a7c..1bb633d 100644
--- a/r/R/dplyr-funcs-type.R
+++ b/r/R/dplyr-funcs-type.R
@@ -20,6 +20,7 @@ register_bindings_type <- function() {
   register_bindings_type_cast()
   register_bindings_type_inspect()
   register_bindings_type_elementwise()
+  register_bindings_type_format()
 }
 
 register_bindings_type_cast <- function() {
@@ -292,3 +293,20 @@ register_bindings_type_elementwise <- function() {
 is_inf & !call_binding("is.na", is_inf)
   })
 }
+
+register_bindings_type_format <- function() {
+  register_binding("format", function(x, ...) {
+# We use R's format if we get a single R object here since we don't (yet)
+# support all of the possible options for casting to string
+if (!inherits(x, "Expression")) {
+  return(format(x, ...))
+}
+
+if (inherits(x, "Expression") &&
+x$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")]) {
+  binding_format_datetime(x, ...)
+} else {
+  build_expr("cast", x, options = cast_options(to_type = string()))
+}
+  })
+}
diff --git a/r/tests/testthat/test-dplyr-funcs-type.R 
b/r/tests/testthat/test-dplyr-funcs-type.R
index 9570ece..6c9d9ac 100644
--- a/r/tests/testthat/test-dplyr-funcs-type.R
+++ b/r/tests/testthat/test-dplyr-funcs-type.R
@@ -874,3 +874,105 @@ test_that("as.Date() converts successfully from date, 
timestamp, integer, char a
 test_df
   )
 })
+
+test_that("format date/time", {
+  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+  times <- tibble(
+datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = 
"Pacific/Marquesas"), NA),
+date = c(as.Date("2021-01-01"), NA)
+  )
+  formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %% 
%G %V %u"
+  formats_date <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %j %U %W %x %X %% %G 
%V %u"
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(x = format(datetime, format = formats)) %>%
+  collect(),
+times
+  )
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(x = format(date, format = formats_date)) %>%
+  collect(),
+times
+  )
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(x = format(datetime, format = formats, tz = "Europe/Bucharest")) 
%>%
+  collect(),
+times
+  )
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(x = format(datetime, format = formats, tz = "EST", usetz = TRUE)) 
%>%
+  collect(),
+times
+  )
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(x = format(1),
+ y = format(13.7, nsmall = 3)) %>%
+  collect(),
+times
+  )
+
+  compare_dplyr_binding(
+.input

[arrow] branch master updated (a76794c -> 1b77e6d)

2022-03-09 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from a76794c  ARROW-15864: [Java][Docs] Update Arrow nightly Maven releases 
documentation
 add 1b77e6d  ARROW-15701 [R] month() should allow integer inputs

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  8 +--
 r/R/dplyr-funcs-datetime.R   | 27 -
 r/tests/testthat/test-dplyr-funcs-datetime.R | 90 
 3 files changed, 106 insertions(+), 19 deletions(-)


[arrow] branch master updated (7aecc83 -> 5772d65)

2022-03-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 7aecc83  ARROW-15847: [Python] Building with Parquet but without 
Parquet encryption fails
 add 5772d65  ARROW-15775 [R] Clean up as.* methods to use build_expr()

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-type.R   | 12 +-
 r/tests/testthat/test-dplyr-funcs-type.R | 39 
 2 files changed, 41 insertions(+), 10 deletions(-)


[arrow] branch master updated (28b7725 -> f5a0caf)

2022-03-04 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 28b7725  ARROW-15844: [Release][Packaging] Use ASCII format for 
detached sign
 add f5a0caf  ARROW-15743: [R] `skip` not connected up to `skip_rows` on 
open_dataset despite error messages indicating otherwise

No new revisions were added by this update.

Summary of changes:
 r/R/dataset-format.R| 69 +
 r/tests/testthat/test-dataset-csv.R | 15 
 2 files changed, 69 insertions(+), 15 deletions(-)


[arrow] branch master updated (ce46c1a -> 9719eae)

2022-03-03 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from ce46c1a  ARROW-15831: [Java] Upgrade Flight dependencies
 add 9719eae  ARROW-14808 [R] Implement bindings for `lubridate::date()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  3 +
 r/R/dplyr-funcs-datetime.R   |  3 +
 r/R/dplyr-funcs-type.R   | 44 ++
 r/tests/testthat/test-dplyr-funcs-datetime.R | 85 
 r/tests/testthat/test-dplyr-funcs-type.R | 75 
 5 files changed, 210 insertions(+)


[arrow] branch master updated (cf0b21c -> 6cf79d6)

2022-03-02 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from cf0b21c  MINOR: [R][DOCS] fix typo
 add 6cf79d6  MINOR: [R] Fix errant trailing whitespace

No new revisions were added by this update.

Summary of changes:
 r/R/scalar.R| 2 +-
 r/man/Scalar.Rd | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)


[arrow] branch master updated (632f4e9 -> 16d0c8a)

2022-03-01 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 632f4e9  MINOR: [R] Fix cheatsheet url in the r folder readme
 add 16d0c8a  MINOR: [R][DOCS] Replace GitHub issue numbers to JIRA issue 
numbers in the Changelog

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


[arrow] branch master updated (fffdca2 -> acfd1d2)

2022-02-24 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from fffdca2  ARROW-15258: [C++] Easy options to create a source node from 
a table
 add acfd1d2  ARROW-15697: [R] Add logo and meta tags to pkgdown site

No new revisions were added by this update.

Summary of changes:
 r/_pkgdown.yml | 8 
 1 file changed, 8 insertions(+)


[arrow] branch master updated (a916e60 -> effed6b)

2022-02-23 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from a916e60  MINOR: [R][DOCS] Fix link
 add effed6b  ARROW-15673 [R] Error gracefully if DuckDB isn't installed

No new revisions were added by this update.

Summary of changes:
 dev/tasks/docker-tests/github.linux.yml |  9 +
 r/R/duckdb.R|  4 
 r/tests/testthat/_snaps/duckdb.md   |  7 +++
 r/tests/testthat/test-duckdb.R  | 19 +--
 4 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 r/tests/testthat/_snaps/duckdb.md


[arrow] branch master updated (6aa30703 -> 194ace5)

2022-02-23 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6aa30703 ARROW-15604: [C++][CI] Sporadic ThreadSanitizer failure with 
OpenTracing
 add 194ace5  ARROW-14826 [R] Implement bindings for `lubridate::dst()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  1 +
 r/R/expression.R |  1 +
 r/tests/testthat/test-dplyr-funcs-datetime.R | 15 +++
 3 files changed, 17 insertions(+)


[arrow] branch master updated (5680d20 -> 16f36a5)

2022-02-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 5680d20  ARROW-15727: [Python] Allow converting lists of MonthDayNano 
intervals to Pandas
 add 16f36a5  ARROW-14815 [R] bindings for `lubridate::semester()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  1 +
 r/R/dplyr-funcs-datetime.R   | 10 +++
 r/tests/testthat/test-dplyr-funcs-datetime.R | 41 ++--
 3 files changed, 50 insertions(+), 2 deletions(-)


[arrow] branch master updated (5216c2b -> 0eaafe8)

2022-02-22 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 5216c2b  ARROW-15348: [Doc][Guide] Lifecycle of a PR - minor 
corrections
 add 0eaafe8  ARROW-14817 [R] Implement bindings for `lubridate::tz()`

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md|  3 +++
 r/R/dplyr-funcs-datetime.R   |  6 +
 r/R/type.R   |  2 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R | 38 
 r/tests/testthat/test-type.R |  9 ---
 5 files changed, 54 insertions(+), 4 deletions(-)


[arrow] branch master updated (5bedee4 -> f9f2c08)

2022-02-19 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 5bedee4  MINOR: [Docs] Update contributing.rst
 add f9f2c08  ARROW-15708: [R] [CI] skip snappy encoded parquets on clang 
sanitizer

No new revisions were added by this update.

Summary of changes:
 ci/scripts/r_sanitize.sh   | 3 +++
 r/tests/testthat.R | 5 -
 r/tests/testthat/test-dataset.R| 5 -
 r/tests/testthat/test-dplyr-join.R | 5 +
 4 files changed, 16 insertions(+), 2 deletions(-)


[arrow] branch master updated (ee9354d -> 3ce4f81)

2022-02-16 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from ee9354d  ARROW-15690: [Dev] Update GitHub Actions workflows that 
hardcode master as default
 add 3ce4f81  ARROW-15468: [R] [CI] A crossbow job that tests against 
DuckDB's dev branch

No new revisions were added by this update.

Summary of changes:
 ci/docker/linux-apt-r.dockerfile |  3 +++
 ci/scripts/r_deps.sh | 12 ++--
 dev/tasks/r/azure.linux.yml  |  6 ++
 dev/tasks/tasks.yml  | 27 +--
 docker-compose.yml   |  4 
 r/tests/testthat/test-duckdb.R   | 20 ++--
 6 files changed, 50 insertions(+), 22 deletions(-)


[arrow] branch master updated (6a2ee11 -> cca3800)

2022-02-15 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6a2ee11  PARQUET-2124: [C++] Remove Parquet Dictionary DCHECK
 add cca3800  ARROW-15013: [R] Expose concatenate at the R level

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE   |  2 ++
 r/R/array.R   | 43 +++
 r/R/arrowExports.R|  4 +++
 r/_pkgdown.yml|  1 +
 r/man/concat_arrays.Rd| 34 +
 r/src/array.cpp   | 13 
 r/src/arrowExports.cpp| 16 ++
 r/tests/testthat/test-Array.R | 69 +++
 8 files changed, 182 insertions(+)
 create mode 100644 r/man/concat_arrays.Rd


[arrow] branch master updated (699449f -> 5ad5ddc)

2022-02-14 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 699449f  MINOR: [Docs][Archery] Correct the links in the README.md
 add 5ad5ddc  ARROW-15606: [CI] [R] Add brew build that exercises the R 
package

No new revisions were added by this update.

Summary of changes:
 dev/tasks/homebrew-formulae/apache-arrow.rb|  4 +++
 dev/tasks/homebrew-formulae/github.macos.yml   | 42 +++---
 dev/tasks/macros.jinja | 29 +++
 ...ub.macos.autobrew.yml => github.macos.brew.yml} | 19 --
 dev/tasks/tasks.yml|  9 +
 5 files changed, 53 insertions(+), 50 deletions(-)
 copy dev/tasks/r/{github.macos.autobrew.yml => github.macos.brew.yml} (70%)


[arrow] branch master updated (7018a4b -> 0a56006)

2022-02-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 7018a4b  ARROW-15595: [Release][Ruby] Add support for MFA
 add 0a56006  ARROW-15020: [R] Add bindings for new dataset writing options

No new revisions were added by this update.

Summary of changes:
 r/R/arrowExports.R|   4 +-
 r/R/dataset-write.R   |  36 +++-
 r/man/write_dataset.Rd|  23 +
 r/src/arrowExports.cpp|  14 +--
 r/src/dataset.cpp |   8 +-
 r/tests/testthat/test-dataset-write.R | 155 ++
 6 files changed, 228 insertions(+), 12 deletions(-)


[arrow] branch master updated (6fa5891 -> 43efadb)

2022-02-08 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 6fa5891  MINOR: [R] Update a URL to https
 add 43efadb  MINOR: [R] run document(), document missing parameters

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE|  3 +++
 r/R/type.R |  5 -
 r/man/FileFormat.Rd|  2 +-
 r/man/array.Rd |  1 +
 r/man/data-type.Rd |  9 +
 r/man/open_dataset.Rd  |  2 +-
 r/src/arrowExports.cpp | 10 +-
 7 files changed, 24 insertions(+), 8 deletions(-)


[arrow] branch master updated (11caf00 -> ee7897a)

2022-02-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 11caf00  ARROW-15570: [CI][Nightly] Drop centos-8 R nightly job
 add ee7897a  ARROW-15605: [CI] [R] Keep using old macos runners on our 
autobrew CI job

No new revisions were added by this update.

Summary of changes:
 dev/tasks/r/github.macos.autobrew.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[arrow] branch master updated (858470d -> 11caf00)

2022-02-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 858470d  ARROW-14745: [R] Enable true duckdb streaming
 add 11caf00  ARROW-15570: [CI][Nightly] Drop centos-8 R nightly job

No new revisions were added by this update.

Summary of changes:
 dev/tasks/tasks.yml | 1 -
 1 file changed, 1 deletion(-)


[arrow] branch master updated (501d92e -> 858470d)

2022-02-07 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 501d92e  ARROW-15080: [Python][C++] Enable tuples conversion to 
interval
 add 858470d  ARROW-14745: [R] Enable true duckdb streaming

No new revisions were added by this update.

Summary of changes:
 r/R/arrow-package.R |  2 +-
 r/R/dplyr-arrange.R |  2 +-
 r/R/dplyr-collect.R |  8 +++---
 r/R/dplyr-count.R   |  4 +--
 r/R/dplyr-distinct.R|  2 +-
 r/R/dplyr-filter.R  |  2 +-
 r/R/dplyr-group-by.R|  8 +++---
 r/R/dplyr-join.R| 12 -
 r/R/dplyr-mutate.R  |  4 +--
 r/R/dplyr-select.R  |  6 ++---
 r/R/dplyr-summarize.R   |  2 +-
 r/R/duckdb.R| 19 -
 r/man/to_arrow.Rd   |  8 --
 r/tests/testthat/test-dataset.R | 54 -
 r/tests/testthat/test-duckdb.R  | 59 -
 15 files changed, 144 insertions(+), 48 deletions(-)


[arrow] branch master updated (d403fd5 -> b39c5a0)

2022-02-03 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d403fd5  ARROW-15532: [C++] Fix unused warning for StringClassifyDoc
 add b39c5a0  ARROW-14169: [R] altrep for factors

No new revisions were added by this update.

Summary of changes:
 r/src/altrep.cpp   | 371 ++---
 r/src/array_to_vector.cpp  |  41 ++---
 r/src/arrow_types.h|   3 +
 r/tests/testthat/test-altrep.R |  22 +++
 4 files changed, 396 insertions(+), 41 deletions(-)


[arrow] branch master updated (d885d82 -> f48cabe)

2022-02-03 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d885d82  ARROW-15546: [FlightRPC][C++] Remove quotes from cookie header
 add f48cabe  ARROW-15480: [R] Expand on schema/colnames mismatch error 
messages

No new revisions were added by this update.

Summary of changes:
 r/R/dataset-format.R| 30 --
 r/tests/testthat/test-dataset-csv.R | 26 +-
 2 files changed, 49 insertions(+), 7 deletions(-)


[arrow] branch master updated (fb5a4f6 -> c89e67d)

2022-02-02 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from fb5a4f6  ARROW-15520: [C++] Qualify `arrow_vendored::date::format()` 
for C++20 compatibility
 add c89e67d  ARROW-15539: [Archery] Add ARROW_JEMALLOC to build options

No new revisions were added by this update.

Summary of changes:
 dev/archery/archery/lang/cpp.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


[arrow] branch master updated (d747326 -> d4e16a5)

2022-01-31 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from d747326  ARROW-14095: [C++] subtract(timestamp, duration) -> timestamp 
kernel
 add d4e16a5  MINOR: [R] Fix misalignment in arrow.Rmd vignette

No new revisions were added by this update.

Summary of changes:
 r/vignettes/arrow.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[arrow] branch master updated (c5b757f -> f92219d)

2022-01-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from c5b757f  ARROW-14419 [R] Add filter + join test
 add f92219d  ARROW-10456: [R] Implement MapType and MapArray

No new revisions were added by this update.

Summary of changes:
 r/R/array.R   |  14 +++
 r/R/arrowExports.R|  61 --
 r/R/type.R|  18 +++
 r/src/array.cpp   |  28 +
 r/src/array_to_vector.cpp |  15 ++-
 r/src/arrowExports.cpp| 238 --
 r/src/datatype.cpp|  59 ++
 r/tests/testthat/test-Array.R |  23 
 r/tests/testthat/test-data-type.R |  38 ++
 r/tests/testthat/test-parquet.R   |  30 +
 r/tests/testthat/test-type.R  |   8 ++
 r/vignettes/arrow.Rmd |   2 +-
 12 files changed, 482 insertions(+), 52 deletions(-)


[arrow] branch master updated (07ec0a1 -> c5b757f)

2022-01-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 07ec0a1  ARROW-14461 [R] write_dataset() allows users to pass invalid 
additional arguments
 add c5b757f  ARROW-14419 [R] Add filter + join test

No new revisions were added by this update.

Summary of changes:
 r/tests/testthat/test-dplyr-join.R | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)


[arrow] branch master updated (39367db -> 07ec0a1)

2022-01-28 Thread jonkeane
This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git.


from 39367db  ARROW-15126: [C++] Support Null type as group keys
 add 07ec0a1  ARROW-14461 [R] write_dataset() allows users to pass invalid 
additional arguments

No new revisions were added by this update.

Summary of changes:
 r/R/dataset-format.R | 42 ++-
 r/R/dataset-write.R  |  4 +--
 r/man/ChunkedArray.Rd|  1 +
 r/man/RecordBatch.Rd |  1 +
 r/man/Table.Rd   |  1 +
 r/man/array.Rd   |  1 +
 r/man/write_dataset.Rd   |  4 +--
 r/tests/testthat/_snaps/dataset-write.md | 49 
 r/tests/testthat/test-dataset-write.R| 34 ++
 9 files changed, 132 insertions(+), 5 deletions(-)
 create mode 100644 r/tests/testthat/_snaps/dataset-write.md


  1   2   3   >