(arrow) branch main updated (da0eb7e9fc -> 6800be9331)

2024-05-29 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from da0eb7e9fc MINOR: [Swift] cleanup some go and C++ artifacts (#41878)
 add 6800be9331 MINOR: [R] Remove writing_bindings from _pkgdown.yml 
(#41877)

No new revisions were added by this update.

Summary of changes:
 r/_pkgdown.yml | 1 -
 1 file changed, 1 deletion(-)



(arrow) branch main updated (4a2df663bc -> 774ee0f2fe)

2024-05-29 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 4a2df663bc GH-41675: [Packaging][MATLAB] Add crossbow job to package 
MATLAB interface on macos-14 (#41677)
 add 774ee0f2fe GH-41834: [R] Better error handling in dplyr code (#41576)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-across.R   |   6 +-
 r/R/dplyr-arrange.R  |  87 
 r/R/dplyr-datetime-helpers.R |  31 +--
 r/R/dplyr-eval.R | 182 +---
 r/R/dplyr-filter.R   |  64 +++---
 r/R/dplyr-funcs-agg.R|   6 +-
 r/R/dplyr-funcs-conditional.R|  16 +-
 r/R/dplyr-funcs-datetime.R   |  18 +-
 r/R/dplyr-funcs-simple.R |   2 +-
 r/R/dplyr-funcs-string.R |  76 ---
 r/R/dplyr-funcs-type.R   |   7 +-
 r/R/dplyr-mutate.R   | 190 +
 r/R/dplyr-slice.R|   2 +-
 r/R/dplyr-summarize.R|  70 ++-
 r/R/dplyr.R  |  16 --
 r/man/arrow_not_supported.Rd |  56 +
 r/tests/testthat/_snaps/dataset-dplyr.md |   9 +
 r/tests/testthat/_snaps/dplyr-across.md  |  11 +
 r/tests/testthat/_snaps/dplyr-eval.md|  27 +++
 r/tests/testthat/_snaps/dplyr-funcs-datetime.md  |  11 +
 r/tests/testthat/_snaps/dplyr-mutate.md  |  25 +++
 r/tests/testthat/_snaps/dplyr-query.md   |   4 +-
 r/tests/testthat/_snaps/dplyr-summarize.md   |  41 +++-
 r/tests/testthat/helper-expectation.R|   7 +-
 r/tests/testthat/test-dataset-dplyr.R|   5 +-
 r/tests/testthat/test-dplyr-across.R |  12 +-
 r/tests/testthat/test-dplyr-collapse.R   |  13 --
 r/tests/testthat/test-dplyr-eval.R   |  60 ++
 r/tests/testthat/test-dplyr-filter.R |  20 +-
 r/tests/testthat/test-dplyr-funcs-conditional.R  | 107 --
 r/tests/testthat/test-dplyr-funcs-datetime.R |  46 +
 r/tests/testthat/test-dplyr-funcs-string.R   |  79 ---
 r/tests/testthat/test-dplyr-mutate.R |  13 +-
 r/tests/testthat/test-dplyr-summarize.R  |  55 ++---
 r/vignettes/developers/matchsubstringoptions.png | Bin 89899 -> 0 bytes
 r/vignettes/developers/starts_with_docs.png  | Bin 9720 -> 0 bytes
 r/vignettes/developers/startswithdocs.png| Bin 42409 -> 0 bytes
 r/vignettes/developers/writing_bindings.Rmd  | 253 ---
 38 files changed, 804 insertions(+), 823 deletions(-)
 create mode 100644 r/man/arrow_not_supported.Rd
 create mode 100644 r/tests/testthat/_snaps/dataset-dplyr.md
 create mode 100644 r/tests/testthat/_snaps/dplyr-across.md
 create mode 100644 r/tests/testthat/_snaps/dplyr-eval.md
 create mode 100644 r/tests/testthat/_snaps/dplyr-funcs-datetime.md
 create mode 100644 r/tests/testthat/_snaps/dplyr-mutate.md
 create mode 100644 r/tests/testthat/test-dplyr-eval.R
 delete mode 100644 r/vignettes/developers/matchsubstringoptions.png
 delete mode 100644 r/vignettes/developers/starts_with_docs.png
 delete mode 100644 r/vignettes/developers/startswithdocs.png
 delete mode 100644 r/vignettes/developers/writing_bindings.Rmd



(arrow) branch main updated: GH-41540: [R] Simplify arrow_eval() logic and bindings environments (#41537)

2024-05-07 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new 03f8ae754e GH-41540: [R] Simplify arrow_eval() logic and bindings 
environments (#41537)
03f8ae754e is described below

commit 03f8ae754ede16f118ccdba0abb593b1461024aa
Author: Neal Richardson 
AuthorDate: Tue May 7 09:42:55 2024 -0400

GH-41540: [R] Simplify arrow_eval() logic and bindings environments (#41537)

### Rationale for this change

NSE is hard enough. I wanted to see if I could remove some layers of
complexity.

### What changes are included in this PR?

* There no longer are separate collections of `agg_funcs` and
`nse_funcs`. Now that the aggregation functions return Expressions
(https://github.com/apache/arrow/pull/41223), there's no reason to treat
them separately. All bindings return Expressions now.
* Both are removed and functions are just stored in `.cache$functions`.
There was a note wondering why both `nse_funcs` and that needed to
exist. They don't.
* `arrow_mask()` no longer has an `aggregations` argument: agg functions
are always present.
* Because agg functions are always present, `filter` and `arrange` now
have to check for whether the expressions passed to them contain
aggregations--this is supported in regular dplyr but we have deferred
supporting it here for now (see
https://github.com/apache/arrow/pull/41350). If we decide we want to
support it later, these checks are the entry points where we'd drop in
the `left_join()` as in `mutate()`.
* The logic of evaluating expresssions in `filter()` has been
simplified.
* Assorted other cleanups: `register_binding()` has two fewer arguments,
for example, and the duplicate functions for referencing agg_funcs are
gone.

There is one more refactor I intend to pursue, and that's to rework
abandon_ship and how arrow_eval does error handling, but I ~may~ will
defer that to a followup.

### Are these changes tested?

Yes, though I'll add some more for filter/aggregate in the followup
since I'm reworking things there.

### Are there any user-facing changes?

There are a couple of edge cases where the error message will change
subtly. For example, if you supplied a comma-separated list of filter
expressions, and more than one of them did not evaluate, previously you
would be informed of all of the failures; now, we error on the first
one. I don't think this is concerning.
* GitHub Issue: #41540
---
 r/R/dplyr-arrange.R |   8 ++
 r/R/dplyr-eval.R|  17 +---
 r/R/dplyr-filter.R  |  54 -
 r/R/dplyr-funcs-agg.R   |  26 +++---
 r/R/dplyr-funcs.R   | 119 ++--
 r/R/dplyr-mutate.R  |   2 +-
 r/R/dplyr-summarize.R   |   2 +-
 r/R/udf.R   |   7 +-
 r/man/register_binding.Rd   |  45 ++-
 r/tests/testthat/test-dataset-dplyr.R   |   2 +-
 r/tests/testthat/test-dplyr-filter.R|   9 ++-
 r/tests/testthat/test-dplyr-funcs.R |  30 +++
 r/tests/testthat/test-dplyr-summarize.R |  28 +++
 r/tests/testthat/test-udf.R |  14 ++--
 r/vignettes/developers/writing_bindings.Rmd |   7 +-
 15 files changed, 109 insertions(+), 261 deletions(-)

diff --git a/r/R/dplyr-arrange.R b/r/R/dplyr-arrange.R
index f91cd14211..c8594c77df 100644
--- a/r/R/dplyr-arrange.R
+++ b/r/R/dplyr-arrange.R
@@ -47,6 +47,14 @@ arrange.arrow_dplyr_query <- function(.data, ..., .by_group 
= FALSE) {
   msg <- paste("Expression", names(sorts)[i], "not supported in Arrow")
   return(abandon_ship(call, .data, msg))
 }
+if (length(mask$.aggregations)) {
+  # dplyr lets you arrange on e.g. x < mean(x), but we haven't implemented 
it.
+  # But we could, the same way it works in mutate() via join, if someone 
asks.
+  # Until then, just error.
+  # TODO: add a test for this
+  msg <- paste("Expression", format_expr(expr), "not supported in 
arrange() in Arrow")
+  return(abandon_ship(call, .data, msg))
+}
 descs[i] <- x[["desc"]]
   }
   .data$arrange_vars <- c(sorts, .data$arrange_vars)
diff --git a/r/R/dplyr-eval.R b/r/R/dplyr-eval.R
index ff1619ce94..211c26cecc 100644
--- a/r/R/dplyr-eval.R
+++ b/r/R/dplyr-eval.R
@@ -121,24 +121,9 @@ arrow_not_supported <- function(msg) {
 }
 
 # Create a data mask for evaluating a dplyr expression
-arrow_mask <- function(.data, aggregation = FALSE) {
+arrow_mask <- function(.data) {
   f_env <- new_environmen

(arrow) branch main updated: MINOR: [R] fix no visible global function definition: left_join (#41542)

2024-05-06 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new d10ebf055a MINOR: [R] fix no visible global function definition: 
left_join (#41542)
d10ebf055a is described below

commit d10ebf055a393c94a693097db1dca08ff86745bd
Author: Neal Richardson 
AuthorDate: Mon May 6 09:28:22 2024 -0400

MINOR: [R] fix no visible global function definition: left_join (#41542)

### Rationale for this change

Followup to #41350, fixes a check NOTE that caused.

### What changes are included in this PR?

`dplyr::` in two places.

### Are these changes tested?

Check will be clean.

### Are there any user-facing changes?


---
 r/R/dplyr-mutate.R | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/r/R/dplyr-mutate.R b/r/R/dplyr-mutate.R
index 880f7799e6..72882b6afd 100644
--- a/r/R/dplyr-mutate.R
+++ b/r/R/dplyr-mutate.R
@@ -84,12 +84,12 @@ mutate.arrow_dplyr_query <- function(.data,
 agg_query$aggregations <- mask$.aggregations
 agg_query <- collapse.arrow_dplyr_query(agg_query)
 if (length(grv)) {
-  out <- left_join(out, agg_query, by = grv)
+  out <- dplyr::left_join(out, agg_query, by = grv)
 } else {
   # If there are no group_by vars, add a scalar column to both and join on 
that
   agg_query$selected_columns[["..tempjoin"]] <- Expression$scalar(1L)
   out$selected_columns[["..tempjoin"]] <- Expression$scalar(1L)
-  out <- left_join(out, agg_query, by = "..tempjoin")
+  out <- dplyr::left_join(out, agg_query, by = "..tempjoin")
 }
   }
 



(arrow) branch main updated (00df70c6dc -> 2ef4059566)

2024-04-29 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 00df70c6dc GH-41398: [R][CI] Windows job failing after R 4.4 release 
(#41409)
 add 2ef4059566 GH-29537: [R] Support mutate/summarize with implicit join 
(#41350)

No new revisions were added by this update.

Summary of changes:
 r/R/arrow-package.R   |  5 +--
 r/R/dplyr-funcs-agg.R |  1 -
 r/R/dplyr-funcs-doc.R |  2 +-
 r/R/dplyr-mutate.R| 39 +---
 r/man/acero.Rd|  2 +-
 r/tests/testthat/test-dataset-dplyr.R | 11 ---
 r/tests/testthat/test-dplyr-mutate.R  | 57 ---
 r/vignettes/data_wrangling.Rmd| 28 +
 8 files changed, 58 insertions(+), 87 deletions(-)



(arrow) branch main updated: MINOR: [R] refactor arrow_mask to include aggregations list (#41414)

2024-04-29 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new c87073737b MINOR: [R] refactor arrow_mask to include aggregations list 
(#41414)
c87073737b is described below

commit c87073737b6ffef9715549a199499b92630e8e5f
Author: Neal Richardson 
AuthorDate: Mon Apr 29 11:32:01 2024 -0400

MINOR: [R] refactor arrow_mask to include aggregations list (#41414)

### Rationale for this change

Keeping the `..aggregations` list in parent.frame felt a little wrong.
As we're starting to use this in more places (like mutate in #41350, and
potentially more places), I wanted to try to improve this. I tried a
bunch of things before to put it somewhere better (like in the mask) but
failed. Finally I found one that worked.

### What changes are included in this PR?

Just a refactor

### Are these changes tested?

Existing tests pass.

### Are there any user-facing changes?

Nope.
---
 r/R/dplyr-eval.R  |  8 +++-
 r/R/dplyr-funcs-agg.R | 23 ---
 r/R/dplyr-summarize.R | 41 ++---
 3 files changed, 33 insertions(+), 39 deletions(-)

diff --git a/r/R/dplyr-eval.R b/r/R/dplyr-eval.R
index 3aaa29696b..ff1619ce94 100644
--- a/r/R/dplyr-eval.R
+++ b/r/R/dplyr-eval.R
@@ -125,13 +125,9 @@ arrow_mask <- function(.data, aggregation = FALSE) {
   f_env <- new_environment(.cache$functions)
 
   if (aggregation) {
-# Add the aggregation functions to the environment, and set the enclosing
-# environment to the parent frame so that, when called from 
summarize_eval(),
-# they can reference and assign into `..aggregations` defined there.
-pf <- parent.frame()
+# Add the aggregation functions to the environment.
 for (f in names(agg_funcs)) {
   f_env[[f]] <- agg_funcs[[f]]
-  environment(f_env[[f]]) <- pf
 }
   } else {
 # Add functions that need to error hard and clear.
@@ -156,6 +152,8 @@ arrow_mask <- function(.data, aggregation = FALSE) {
   # TODO: figure out what rlang::as_data_pronoun does/why we should use it
   # (because if we do we get `Error: Can't modify the data pronoun` in 
mutate())
   out$.data <- .data$selected_columns
+  # Add the aggregations list to collect any that get pulled out when 
evaluating
+  out$.aggregations <- empty_named_list()
   out
 }
 
diff --git a/r/R/dplyr-funcs-agg.R b/r/R/dplyr-funcs-agg.R
index ab1df1d2f1..d84f8f28f0 100644
--- a/r/R/dplyr-funcs-agg.R
+++ b/r/R/dplyr-funcs-agg.R
@@ -17,7 +17,7 @@
 
 # Aggregation functions
 #
-# These all insert into an ..aggregations list (in a parent frame) a list 
containing:
+# These all insert into an .aggregations list in the mask, a list containing:
 # @param fun string function name
 # @param data list of 0 or more Expressions
 # @param options list of function options, as passed to call_function
@@ -154,11 +154,11 @@ register_bindings_aggregate <- function() {
 
 set_agg <- function(...) {
   agg_data <- list2(...)
-  # Find the environment where ..aggregations is stored
+  # Find the environment where .aggregations is stored
   target <- find_aggregations_env()
-  aggs <- get("..aggregations", target)
+  aggs <- get(".aggregations", target)
   lapply(agg_data[["data"]], function(expr) {
-# If any of the fields referenced in the expression are in ..aggregations,
+# If any of the fields referenced in the expression are in .aggregations,
 # then we can't aggregate over them.
 # This is mainly for combinations of dataset columns and aggregations,
 # like sum(x - mean(x)), i.e. window functions.
@@ -169,23 +169,24 @@ set_agg <- function(...) {
 }
   })
 
-  # Record the (fun, data, options) in ..aggregations
+  # Record the (fun, data, options) in .aggregations
   # and return a FieldRef pointing to it
   tmpname <- paste0("..temp", length(aggs))
   aggs[[tmpname]] <- agg_data
-  assign("..aggregations", aggs, envir = target)
+  assign(".aggregations", aggs, envir = target)
   Expression$field_ref(tmpname)
 }
 
 find_aggregations_env <- function() {
-  # Find the environment where ..aggregations is stored,
+  # Find the environment where .aggregations is stored,
   # it's in parent.env of something in the call stack
-  for (f in sys.frames()) {
-if (exists("..aggregations", envir = f)) {
-  return(f)
+  n <- 1
+  while (TRUE) {
+if (exists(".aggregations", envir = caller_env(n))) {
+  return(caller_env(n))
 }
+n <- n + 1
   }
-  stop("Could not find ..aggregations")
 }
 
 ensure_one_arg <- function(args, fun) {
diff --git a/r/R/dplyr-summarize.R b/r/R/dplyr-summarize.R
index 5bb81dc2b3

(arrow) branch main updated: GH-41358: [R] Support join "na_matches" argument (#41372)

2024-04-26 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new ea314a3f8d GH-41358: [R] Support join "na_matches" argument (#41372)
ea314a3f8d is described below

commit ea314a3f8d9d4446836aa999b66659c07421f7a4
Author: Neal Richardson 
AuthorDate: Fri Apr 26 18:32:32 2024 -0400

GH-41358: [R] Support join "na_matches" argument (#41372)

### Rationale for this change

Noticed in #41350, I made #41358 to implement this in C++, but it turns
out the option was there, just buried a bit.

### What changes are included in this PR?

`na_matches` is mapped through to the `key_cmp` field in
`HashJoinNodeOptions`. Acero supports having a different value for this
for each of the join keys, but dplyr does not, so I kept it constant for
all key columns to match the dplyr behavior.

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes
* GitHub Issue: #41358
---
 r/NEWS.md  |  1 +
 r/R/arrow-package.R| 12 ++--
 r/R/arrowExports.R |  4 ++--
 r/R/dplyr-funcs-doc.R  | 12 ++--
 r/R/dplyr-join.R   |  8 +---
 r/R/query-engine.R |  8 +---
 r/man/acero.Rd | 12 ++--
 r/src/arrowExports.cpp | 11 ++-
 r/src/compute-exec.cpp | 18 +-
 r/tests/testthat/test-dplyr-join.R | 32 
 10 files changed, 82 insertions(+), 36 deletions(-)

diff --git a/r/NEWS.md b/r/NEWS.md
index 4ed9f28a28..05f934dac6 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -21,6 +21,7 @@
 
 * R functions that users write that use functions that Arrow supports in 
dataset queries now can be used in queries too. Previously, only functions that 
used arithmetic operators worked. For example, `time_hours <- function(mins) 
mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` 
did not; now both work. These are automatic translations rather than true 
user-defined functions (UDFs); for UDFs, see `register_scalar_function()`. 
(#41223)
 * `summarize()` supports more complex expressions, and correctly handles cases 
where column names are reused in expressions. 
+* The `na_matches` argument to the `dplyr::*_join()` functions is now 
supported. This argument controls whether `NA` values are considered equal when 
joining. (#41358)
 
 # arrow 16.0.0
 
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index f6977e6262..7087a40c49 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -66,12 +66,12 @@ supported_dplyr_methods <- list(
   compute = NULL,
   collapse = NULL,
   distinct = "`.keep_all = TRUE` not supported",
-  left_join = "the `copy` and `na_matches` arguments are ignored",
-  right_join = "the `copy` and `na_matches` arguments are ignored",
-  inner_join = "the `copy` and `na_matches` arguments are ignored",
-  full_join = "the `copy` and `na_matches` arguments are ignored",
-  semi_join = "the `copy` and `na_matches` arguments are ignored",
-  anti_join = "the `copy` and `na_matches` arguments are ignored",
+  left_join = "the `copy` argument is ignored",
+  right_join = "the `copy` argument is ignored",
+  inner_join = "the `copy` argument is ignored",
+  full_join = "the `copy` argument is ignored",
+  semi_join = "the `copy` argument is ignored",
+  anti_join = "the `copy` argument is ignored",
   count = NULL,
   tally = NULL,
   rename_with = NULL,
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 752d3a266b..62e2182ffc 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -484,8 +484,8 @@ ExecNode_Aggregate <- function(input, options, key_names) {
   .Call(`_arrow_ExecNode_Aggregate`, input, options, key_names)
 }
 
-ExecNode_Join <- function(input, join_type, right_data, left_keys, right_keys, 
left_output, right_output, output_suffix_for_left, output_suffix_for_right) {
-  .Call(`_arrow_ExecNode_Join`, input, join_type, right_data, left_keys, 
right_keys, left_output, right_output, output_suffix_for_left, 
output_suffix_for_right)
+ExecNode_Join <- function(input, join_type, right_data, left_keys, right_keys, 
left_output, right_output, output_suffix_for_left, output_suffix_for_right, 
na_matches) {
+  .Call(`_arrow_ExecNode_Join`, input, join_type, right_data, left_keys, 
right_keys, left_output, right_output, output_suffix_for_left, 
output_suffix_for_right, na_matches)
 }
 
 ExecNode_Union <- function(input, right_data) {
diff --git a/r/R/dplyr-funcs-doc.R b/r/R/dplyr-funcs-doc.R
index 2042f80014..fda77bca83 100644
--

(arrow) branch main updated: MINOR: [R] refactor: move aggregation function bindings to their own file (#41355)

2024-04-23 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new f1bc82f2b3 MINOR: [R] refactor: move aggregation function bindings to 
their own file (#41355)
f1bc82f2b3 is described below

commit f1bc82f2b39a317970427052c360383f983ec3f8
Author: Neal Richardson 
AuthorDate: Tue Apr 23 13:31:26 2024 -0400

MINOR: [R] refactor: move aggregation function bindings to their own file 
(#41355)

For consistency with other bindings, and to allow `dplyr-summarize.R` to
start with the summarize method, as do the other dplyr verb files.
---
 r/DESCRIPTION |   1 +
 r/R/dplyr-funcs-agg.R | 198 ++
 r/R/dplyr-funcs.R |  16 +++-
 r/R/dplyr-summarize.R | 195 -
 4 files changed, 213 insertions(+), 197 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 2efaed4d6c..eeff8168b3 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -107,6 +107,7 @@ Collate:
 'dplyr-distinct.R'
 'dplyr-eval.R'
 'dplyr-filter.R'
+'dplyr-funcs-agg.R'
 'dplyr-funcs-augmented.R'
 'dplyr-funcs-conditional.R'
 'dplyr-funcs-datetime.R'
diff --git a/r/R/dplyr-funcs-agg.R b/r/R/dplyr-funcs-agg.R
new file mode 100644
index 00..ab1df1d2f1
--- /dev/null
+++ b/r/R/dplyr-funcs-agg.R
@@ -0,0 +1,198 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Aggregation functions
+#
+# These all insert into an ..aggregations list (in a parent frame) a list 
containing:
+# @param fun string function name
+# @param data list of 0 or more Expressions
+# @param options list of function options, as passed to call_function
+# The functions return a FieldRef pointing to the result of the aggregation.
+#
+# For group-by aggregation, `hash_` gets prepended to the function name when
+# the query is executed.
+# So to see a list of available hash aggregation functions,
+# you can use list_compute_functions("^hash_")
+
+register_bindings_aggregate <- function() {
+  register_binding_agg("base::sum", function(..., na.rm = FALSE) {
+set_agg(
+  fun = "sum",
+  data = ensure_one_arg(list2(...), "sum"),
+  options = list(skip_nulls = na.rm, min_count = 0L)
+)
+  })
+  register_binding_agg("base::prod", function(..., na.rm = FALSE) {
+set_agg(
+  fun = "product",
+  data = ensure_one_arg(list2(...), "prod"),
+  options = list(skip_nulls = na.rm, min_count = 0L)
+)
+  })
+  register_binding_agg("base::any", function(..., na.rm = FALSE) {
+set_agg(
+  fun = "any",
+  data = ensure_one_arg(list2(...), "any"),
+  options = list(skip_nulls = na.rm, min_count = 0L)
+)
+  })
+  register_binding_agg("base::all", function(..., na.rm = FALSE) {
+set_agg(
+  fun = "all",
+  data = ensure_one_arg(list2(...), "all"),
+  options = list(skip_nulls = na.rm, min_count = 0L)
+)
+  })
+  register_binding_agg("base::mean", function(x, na.rm = FALSE) {
+set_agg(
+  fun = "mean",
+  data = list(x),
+  options = list(skip_nulls = na.rm, min_count = 0L)
+)
+  })
+  register_binding_agg("stats::sd", function(x, na.rm = FALSE, ddof = 1) {
+set_agg(
+  fun = "stddev",
+  data = list(x),
+  options = list(skip_nulls = na.rm, min_count = 0L, ddof = ddof)
+)
+  })
+  register_binding_agg("stats::var", function(x, na.rm = FALSE, ddof = 1) {
+set_agg(
+  fun = "variance",
+  data = list(x),
+  options = list(skip_nulls = na.rm, min_count = 0L, ddof = ddof)
+)
+  })
+  register_binding_agg(
+"stats::quantile",
+function(x, probs, na.rm = FALSE) {
+  if (length(probs) != 1) {
+arrow_not_supported("quantile() with length(probs) != 1")
+  }
+  # TODO: Bind to the Arrow function that returns an exact quantile and 
remove
+  # this warni

(arrow) branch main updated (79799e59b1 -> 5865e96db2)

2024-04-22 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 79799e59b1 GH-39664: [C++][Acero] Ensure Acero benchmarks present a 
metric for identifying throughput (#40884)
 add 5865e96db2 GH-41323: [R] Redo how summarize() evaluates expressions 
(#41223)

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md   |   3 +
 r/R/arrowExports.R  |   4 +
 r/R/dplyr-across.R  |   1 -
 r/R/dplyr-eval.R|  76 +-
 r/R/dplyr-summarize.R   | 345 +++-
 r/R/expression.R|   3 +
 r/src/arrowExports.cpp  |   9 +
 r/src/expression.cpp|  17 ++
 r/tests/testthat/test-dplyr-across.R|  20 +-
 r/tests/testthat/test-dplyr-filter.R|   1 -
 r/tests/testthat/test-dplyr-funcs-conditional.R |  15 ++
 r/tests/testthat/test-dplyr-summarize.R | 137 --
 12 files changed, 398 insertions(+), 233 deletions(-)



[arrow-site] branch main updated: MINOR: Update some affiliations (#361)

2023-05-31 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/main by this push:
 new 09c2ba8633a MINOR: Update some affiliations (#361)
09c2ba8633a is described below

commit 09c2ba8633a8cbf1acf813bd83c41c5a1e861ff6
Author: Neal Richardson 
AuthorDate: Wed May 31 16:50:30 2023 -0400

MINOR: Update some affiliations (#361)
---
 _data/committers.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/_data/committers.yml b/_data/committers.yml
index f0d65228e26..a72faa0b1e1 100644
--- a/_data/committers.yml
+++ b/_data/committers.yml
@@ -130,7 +130,7 @@
 - name: Neal Richardson
   role: PMC
   alias: npr
-  affiliation: Voltron Data
+  affiliation: Posit
 - name: Neville Dipale
   role: PMC
   alias: nevime
@@ -371,7 +371,7 @@
 - name: Romain Francois
   role: Committer
   alias: romainfrancois
-  affiliation: RStudio
+  affiliation: Posit
 - name: Ruihang Xia
   role: Committer
   alias: waynexia



[arrow] branch main updated: MINOR: [R] ARROW_ACERO should be ON by default in bundled build (#35407)

2023-05-05 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new c2f7d13e16 MINOR: [R] ARROW_ACERO should be ON by default in bundled 
build (#35407)
c2f7d13e16 is described below

commit c2f7d13e16c4ec3c8fba551a157cd71398194e6f
Author: Neal Richardson 
AuthorDate: Fri May 5 10:04:06 2023 -0400

MINOR: [R] ARROW_ACERO should be ON by default in bundled build (#35407)

To match ARROW_DATASET. Without this, the default CRAN version on Linux
won't have Acero enabled.

This should be cherry-picked for the 12.0.0 CRAN submission.
---
 r/inst/build_arrow_static.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/r/inst/build_arrow_static.sh b/r/inst/build_arrow_static.sh
index 4c7f705708..fe56b9fca9 100755
--- a/r/inst/build_arrow_static.sh
+++ b/r/inst/build_arrow_static.sh
@@ -61,7 +61,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
 -DARROW_BUILD_TESTS=OFF \
 -DARROW_BUILD_SHARED=OFF \
 -DARROW_BUILD_STATIC=ON \
--DARROW_ACERO=${ARROW_ACERO:-$ARROW_DEFAULT_PARAM} \
+-DARROW_ACERO=${ARROW_ACERO:-ON} \
 -DARROW_COMPUTE=ON \
 -DARROW_CSV=ON \
 -DARROW_DATASET=${ARROW_DATASET:-ON} \



[arrow] branch main updated: GH-35140: [R] Rewrite configure script and ensure we don't use mismatched libarrow (#35147)

2023-05-03 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
 new ec89360212 GH-35140: [R] Rewrite configure script and ensure we don't 
use mismatched libarrow (#35147)
ec89360212 is described below

commit ec893602124b776fb42261361d1a2d21a6d61f06
Author: Neal Richardson 
AuthorDate: Wed May 3 10:37:09 2023 -0400

GH-35140: [R] Rewrite configure script and ensure we don't use mismatched 
libarrow (#35147)

I've significantly rewritten `r/configure` to make it easier to reason 
about and harder for issues like https://github.com/apache/arrow/pull/34229 and 
#35140 to happen. I've also added a version check to make sure that we don't 
obviously try to use a system C++ library that doesn't match the R package 
version. Making sure this was applied in all of the right places and handling 
what to do if the versions didn't match was the impetus for the whole refactor.

`configure` has been broken up into some functions, and the flow of the 
script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, 
and they all prefer using `pkg-config` to read from the libarrow build what 
libraries and flags it requires, rather than hard-coding. (autobrew is the only 
remaining exception, but I didn't feel like messing with that today.) This 
should make the builds more future proof, should make it so more build 
configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't 
have gotten picked up correctly b [...]

Version checking has been added in an R script for ease of testing (and for 
easier handling of arithmetic), and there is an accompanying 
`test-check-versions.R` added. These are run on all the builds that use 
`ci/scripts/r_test.sh`.

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), 
but the version does not match, it will not be used, and we will try a bundled 
build. This should mean that users installing a released version will never 
have libarrow version problems.
* If both the found C++ library and R package are on matching dev versions 
(i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it 
will proceed with a warning that you may need to rebuild if there are issues. 
This means that regular developers will see an extra message in the build 
output.
* autobrew is only used on a release version unless you set 
FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ 
release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has 
been removed. Use `ARROW_HOME` instead.

* Closes: #35140
* Closes: #31989

Lead-authored-by: Neal Richardson 
Co-authored-by: Sutou Kouhei 
Signed-off-by: Neal Richardson 
---
 dev/tasks/conda-recipes/r-arrow/meta.yaml  |   4 +-
 dev/tasks/r/github.macos.autobrew.yml  |   1 +
 dev/tasks/r/github.packages.yml|   3 +-
 r/Makefile |   2 +-
 r/configure| 555 -
 r/inst/build_arrow_static.sh   |   8 +-
 r/tools/check-versions.R   |  59 +++
 r/tools/nixlibs.R  |  30 +-
 r/tools/test-check-versions.R  |  62 
 r/vignettes/developers/install_details.Rmd |  42 ++-
 r/vignettes/developers/install_nix.png | Bin 99333 -> 0 bytes
 r/vignettes/install.Rmd|   9 +-
 r/vignettes/install_nightly.Rmd|   2 +-
 13 files changed, 502 insertions(+), 275 deletions(-)

diff --git a/dev/tasks/conda-recipes/r-arrow/meta.yaml 
b/dev/tasks/conda-recipes/r-arrow/meta.yaml
index 4c86dc9280..28ee8eb92c 100644
--- a/dev/tasks/conda-recipes/r-arrow/meta.yaml
+++ b/dev/tasks/conda-recipes/r-arrow/meta.yaml
@@ -59,8 +59,8 @@ requirements:
 
 test:
   commands:
-- $R -e "library('arrow')"   # [not win]
-- "\"%R%\" -e \"library('arrow'); data(mtcars); write_parquet(mtcars, 
'test.parquet')\""  # [win]
+- $R -e "library('arrow'); stopifnot(arrow_with_acero(), 
arrow_with_dataset(), arrow_with_parquet(), arrow_with_s3())"   # [not 
win]
+- "\"%R%\" -e \

[arrow] branch main updated (7526df9ad9 -> 14e9e3cb13)

2023-04-07 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 7526df9ad9 GH-34946: [Ruby] Remove DictionaryArrayBuilder related 
omissions (#34947)
 add 14e9e3cb13 MINOR: [R] Unskip acero tests (#34943)

No new revisions were added by this update.

Summary of changes:
 r/R/arrow-info.R | 1 +
 1 file changed, 1 insertion(+)



[arrow] branch master updated: GH-33892: [R] Map `dplyr::n()` to `count_all` kernel (#33917)

2023-02-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 0368e410be GH-33892: [R] Map `dplyr::n()` to `count_all` kernel 
(#33917)
0368e410be is described below

commit 0368e410be4dac30eada13d307b415165aedc6a7
Author: Ian Cook 
AuthorDate: Mon Feb 13 10:16:03 2023 -0500

GH-33892: [R] Map `dplyr::n()` to `count_all` kernel (#33917)

### Rationale for this change

This PR is a follow-up to #15083. It allows the R package to register 
bindings to nullary aggregation functions, and it remaps `dplyr::n()` to the 
nullary aggregation function `count_all`.

This PR also:
- Prepares the R bindings to support aggregation functions with 2+ 
arguments, although none yet exist in the C++ library
- Removes the heuristics that were used to infer the data types of 
aggregates, replacing that with actual type determination

### Are these changes tested?

Yes, through existing tests.

### Are there any user-facing changes?

No.
* Closes: #33892
* Closes: #33960

Authored-by: Ian Cook 
Signed-off-by: Neal Richardson 
---
 r/R/dplyr-collect.R |  18 +++---
 r/R/dplyr-funcs.R   |   2 +-
 r/R/dplyr-summarize.R   | 102 +---
 r/R/query-engine.R  |  12 ++--
 r/man/register_binding.Rd   |   2 +-
 r/src/compute-exec.cpp  |   8 ++-
 r/tests/testthat/test-dplyr-collapse.R  |   4 +-
 r/tests/testthat/test-dplyr-summarize.R |  10 +++-
 8 files changed, 103 insertions(+), 55 deletions(-)

diff --git a/r/R/dplyr-collect.R b/r/R/dplyr-collect.R
index 395026ce78..f45a9886ea 100644
--- a/r/R/dplyr-collect.R
+++ b/r/R/dplyr-collect.R
@@ -179,19 +179,15 @@ implicit_schema <- function(.data) {
   new_fields <- c(left_fields, right_fields)
 }
   } else {
-# The output schema is based on the aggregations and any group_by vars
-new_fields <- map(summarize_projection(.data), ~ .$type(old_schm))
-# * Put group_by_vars first (this can't be done by summarize,
-#   they have to be last per the aggregate node signature,
-#   and they get projected to this order after aggregation)
-# * Infer the output types from the aggregations
-group_fields <- new_fields[.data$group_by_vars]
 hash <- length(.data$group_by_vars) > 0
-agg_fields <- imap(
-  new_fields[setdiff(names(new_fields), .data$group_by_vars)],
-  ~ agg_fun_output_type(.data$aggregations[[.y]][["fun"]], .x, hash)
+# The output schema is based on the aggregations and any group_by vars.
+# The group_by vars come first (this can't be done by summarize; they have
+# to be last per the aggregate node signature, and they get projected to
+# this order after aggregation)
+new_fields <- c(
+  group_types(.data, old_schm),
+  aggregate_types(.data, hash, old_schm)
 )
-new_fields <- c(group_fields, agg_fields)
   }
   schema(!!!new_fields)
 }
diff --git a/r/R/dplyr-funcs.R b/r/R/dplyr-funcs.R
index ce88e25bcb..2728a64539 100644
--- a/r/R/dplyr-funcs.R
+++ b/r/R/dplyr-funcs.R
@@ -49,7 +49,7 @@ NULL
 #'   aggregate function. This function must accept `Expression` objects as
 #'   arguments and return a `list()` with components:
 #'   - `fun`: string function name
-#'   - `data`: `Expression` (these are all currently a single field)
+#'   - `data`: list of 0 or more `Expression`s
 #'   - `options`: list of function options, as passed to call_function
 #' @param update_cache Update .cache$functions at the time of registration.
 #'   the default is FALSE because the majority of usage is to register
diff --git a/r/R/dplyr-summarize.R b/r/R/dplyr-summarize.R
index 5e670538f6..184c0aade4 100644
--- a/r/R/dplyr-summarize.R
+++ b/r/R/dplyr-summarize.R
@@ -18,7 +18,7 @@
 # Aggregation functions
 # These all return a list of:
 # @param fun string function name
-# @param data Expression (these are all currently a single field)
+# @param data list of 0 or more Expressions
 # @param options list of function options, as passed to call_function
 # For group-by aggregation, `hash_` gets prepended to the function name.
 # So to see a list of available hash aggregation functions,
@@ -31,28 +31,7 @@ ensure_one_arg <- function(args, fun) {
   } else if (length(args) > 1) {
 arrow_not_supported(paste0("Multiple arguments to ", fun, "()"))
   }
-  args[[1]]
-}
-
-agg_fun_output_type <- function(fun, input_type, hash) {
-  # These are quick and dirty heuristics.
-  if (fun %in% c("any", "all")) {
-bool()
-  } else if (fun %in% "sum") {
-# It may upcast to a bigger type but this is close enough
-input_type
-  } else if (fu

[arrow] branch master updated: GH-33760: [R][C++] Handle nested field refs in scanner (#33770)

2023-01-24 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d0a7fb9403 GH-33760: [R][C++] Handle nested field refs in scanner  
(#33770)
d0a7fb9403 is described below

commit d0a7fb9403a904b7850517c745c3925695d8658d
Author: Neal Richardson 
AuthorDate: Tue Jan 24 11:56:30 2023 -0500

GH-33760: [R][C++] Handle nested field refs in scanner  (#33770)

### Rationale for this change

Followup to https://github.com/apache/arrow/pull/19706/files#r1073391100 
with the goal of deleting and simplifying some code. As it turned out, it was 
more about moving code from the R bindings to the C++ library.

### Are there any user-facing changes?

Not for R users, but this fixes a bug in the dataset C++ library where 
nested field refs could not be handled by the scanner.

* Closes: #33760

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 cpp/src/arrow/dataset/scanner.cc | 23 ---
 r/R/arrowExports.R   |  9 +++--
 r/R/query-engine.R   | 10 +++---
 r/src/arrowExports.cpp   | 19 +--
 r/src/compute-exec.cpp   | 23 +--
 r/src/expression.cpp | 19 ---
 6 files changed, 40 insertions(+), 63 deletions(-)

diff --git a/cpp/src/arrow/dataset/scanner.cc b/cpp/src/arrow/dataset/scanner.cc
index f307787357..bc8feec96d 100644
--- a/cpp/src/arrow/dataset/scanner.cc
+++ b/cpp/src/arrow/dataset/scanner.cc
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "arrow/array/array_primitive.h"
@@ -135,6 +136,7 @@ Result> 
GetProjectedSchemaFromExpression(
 const std::shared_ptr& dataset_schema) {
   // process resultant dataset_schema after projection
   FieldVector project_fields;
+  std::set field_names;
   if (auto call = projection.call()) {
 if (call->function_name != "make_struct") {
   return Status::Invalid("Top level projection expression call must be 
make_struct");
@@ -142,13 +144,11 @@ Result> 
GetProjectedSchemaFromExpression(
 for (const compute::Expression& arg : call->arguments) {
   if (auto field_ref = arg.field_ref()) {
 if (field_ref->IsName()) {
-  auto field = dataset_schema->GetFieldByName(*field_ref->name());
-  if (field) {
-project_fields.push_back(std::move(field));
-  }
-  // if the field is not present in the schema we ignore it.
-  // the case is if kAugmentedFields are present in the expression
-  // and if they are not present in the provided schema, we ignore 
them.
+  field_names.emplace(*field_ref->name());
+} else if (field_ref->IsNested()) {
+  // We keep the top-level field name.
+  auto nested_field_refs = *field_ref->nested_refs();
+  field_names.emplace(*nested_field_refs[0].name());
 } else {
   return Status::Invalid(
   "No projected schema was supplied and we could not infer the 
projected "
@@ -157,6 +157,15 @@ Result> 
GetProjectedSchemaFromExpression(
   }
 }
   }
+  for (auto f : field_names) {
+auto field = dataset_schema->GetFieldByName(f);
+if (field) {
+  // if the field is not present in the schema we ignore it.
+  // the case is if kAugmentedFields are present in the expression
+  // and if they are not present in the provided schema, we ignore them.
+  project_fields.push_back(std::move(field));
+}
+  }
   return schema(project_fields);
 }
 
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 2eeca24dbd..5e807fbab1 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -460,8 +460,8 @@ ExecNode_output_schema <- function(node) {
   .Call(`_arrow_ExecNode_output_schema`, node)
 }
 
-ExecNode_Scan <- function(plan, dataset, filter, materialized_field_names) {
-  .Call(`_arrow_ExecNode_Scan`, plan, dataset, filter, 
materialized_field_names)
+ExecNode_Scan <- function(plan, dataset, filter, projection) {
+  .Call(`_arrow_ExecNode_Scan`, plan, dataset, filter, projection)
 }
 
 ExecPlan_Write <- function(plan, final_node, metadata, file_write_options, 
filesystem, base_dir, partitioning, basename_template, existing_data_behavior, 
max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, 
max_rows_per_group) {
@@ -1088,10 +1088,6 @@ compute___expr__is_field_ref <- function(x) {
   .Call(`_arrow_compute___expr__is_field_ref`, x)
 }
 
-field_names_in_expression <- function(x) {
-  .Call(`_arrow_field_names_in_expression`, x)
-}
-
 compute___expr__get_field_ref_name <- function(x) {
   .Call(`_arrow_compute___expr__get_field_ref_name`, x)
 }
@@ -2095

[arrow] branch master updated: GH-18818: [R] Create a field ref to a field in a struct (#19706)

2023-01-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 1d9366f19e GH-18818: [R] Create a field ref to a field in a struct 
(#19706)
1d9366f19e is described below

commit 1d9366f19e4b9846b33cc0c7bd7941cb5f482d74
Author: Neal Richardson 
AuthorDate: Wed Jan 18 12:38:06 2023 -0500

GH-18818: [R] Create a field ref to a field in a struct (#19706)

This PR implements `$.Expression` and `[[.Expression` methods, such that if 
the Expression is a FieldRef, it returns a nested FieldRef. This required 
revising some assumptions in a few places, particularly that if an Expression 
is a FieldRef, it has a `name`, and that all FieldRefs correspond to a Field in 
a Schema. In the case where the Expression is not a FieldRef, it will create an 
Expression call to `struct_field` to extract the field, iff the Expression has 
a knowable `type`, the [...]

Things not done because they weren't needed to get this working:

  * `Expression$field_ref()` take a vector to construct a nested ref
  * Method to return vector of nested components of a field ref in R

Next steps for future PRs:

* Wrap this in 
[tidyr::unpack()](https://tidyr.tidyverse.org/reference/pack.html) method (but 
unfortunately, unpack() is not a generic)
* https://github.com/apache/arrow/issues/33756
* https://github.com/apache/arrow/issues/33757
* https://github.com/apache/arrow/issues/33760

* Closes: #18818

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/NAMESPACE |  3 ++
 r/R/arrow-object.R  |  2 +-
 r/R/arrowExports.R  |  9 -
 r/R/expression.R| 55 +
 r/R/type.R  |  3 ++
 r/src/arrowExports.cpp  | 19 ++
 r/src/compute.cpp   | 14 
 r/src/expression.cpp| 40 +++--
 r/tests/testthat/test-dplyr-query.R | 70 +
 r/tests/testthat/test-expression.R  | 26 ++
 10 files changed, 237 insertions(+), 4 deletions(-)

diff --git a/r/NAMESPACE b/r/NAMESPACE
index 3df107a2d8..3ab828a958 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -2,6 +2,7 @@
 
 S3method("!=",ArrowObject)
 S3method("$",ArrowTabular)
+S3method("$",Expression)
 S3method("$",Schema)
 S3method("$",StructArray)
 S3method("$",SubTreeFileSystem)
@@ -14,6 +15,7 @@ S3method("[",Dataset)
 S3method("[",Schema)
 S3method("[",arrow_dplyr_query)
 S3method("[[",ArrowTabular)
+S3method("[[",Expression)
 S3method("[[",Schema)
 S3method("[[",StructArray)
 S3method("[[<-",ArrowTabular)
@@ -137,6 +139,7 @@ S3method(names,Scanner)
 S3method(names,ScannerBuilder)
 S3method(names,Schema)
 S3method(names,StructArray)
+S3method(names,StructType)
 S3method(names,Table)
 S3method(names,arrow_dplyr_query)
 S3method(print,"arrow-enum")
diff --git a/r/R/arrow-object.R b/r/R/arrow-object.R
index 516f407aaf..5c2cf4691f 100644
--- a/r/R/arrow-object.R
+++ b/r/R/arrow-object.R
@@ -32,7 +32,7 @@ ArrowObject <- R6Class("ArrowObject",
   assign(".:xp:.", xp, envir = self)
 },
 class_title = function() {
-  if (!is.null(self$.class_title)) {
+  if (".class_title" %in% ls(self, all.names = TRUE)) {
 # Allow subclasses to override just printing the class name first
 class_title <- self$.class_title()
   } else {
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 38f1ecfb97..2eeca24dbd 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -1084,6 +1084,10 @@ compute___expr__call <- function(func_name, 
argument_list, options) {
   .Call(`_arrow_compute___expr__call`, func_name, argument_list, options)
 }
 
+compute___expr__is_field_ref <- function(x) {
+  .Call(`_arrow_compute___expr__is_field_ref`, x)
+}
+
 field_names_in_expression <- function(x) {
   .Call(`_arrow_field_names_in_expression`, x)
 }
@@ -1096,6 +1100,10 @@ compute___expr__field_ref <- function(name) {
   .Call(`_arrow_compute___expr__field_ref`, name)
 }
 
+compute___expr__nested_field_ref <- function(x, name) {
+  .Call(`_arrow_compute___expr__nested_field_ref`, x, name)
+}
+
 compute___expr__scalar <- function(x) {
   .Call(`_arrow_compute___expr__scalar`, x)
 }
@@ -2087,4 +2095,3 @@ SetIOThreadPoolCapacity <- function(threads) {
 Array__infer_type <- function(x) {
   .Call(`_arrow_Array__infer_type`, x)
 }
-
diff --git a/r/R/expression.R b/r/R/expression.R
index a1163c12a8..8f84b4b31e 100644
--- a/r/R/expression.R
+++ b/r/R/expression.R
@@ -57,6 +57,9 @@ Expression <-

[arrow] branch master updated: MINOR: [R] Fix for dev purrr (#14581)

2022-11-03 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 04917f944b MINOR: [R] Fix for dev purrr (#14581)
04917f944b is described below

commit 04917f944b65b73cc954b5b243f193a5b336f0f8
Author: Hadley Wickham 
AuthorDate: Thu Nov 3 12:00:51 2022 -0500

MINOR: [R] Fix for dev purrr (#14581)

The recycling rules in map2() are now stricter, so we need to check that 
`x` actually has columns before applying the metadata.

I also mildly refactored the test to make it easier to run in isolation; 
I'm happy to revert those changes if desired.

Authored-by: Hadley Wickham 
Signed-off-by: Neal Richardson 
---
 r/R/metadata.R   |  8 +++--
 r/tests/testthat/test-metadata.R | 65 +++-
 2 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/r/R/metadata.R b/r/R/metadata.R
index 747f08069e..6a54b3e384 100644
--- a/r/R/metadata.R
+++ b/r/R/metadata.R
@@ -86,9 +86,11 @@ apply_arrow_r_metadata <- function(x, r_metadata) {
 call. = FALSE
   )
 } else {
-  x <- map2(x, columns_metadata, function(.x, .y) {
-apply_arrow_r_metadata(.x, .y)
-  })
+  if (length(x) > 0) {
+x <- map2(x, columns_metadata, function(.x, .y) {
+  apply_arrow_r_metadata(.x, .y)
+})
+  }
 }
 x
   }
diff --git a/r/tests/testthat/test-metadata.R b/r/tests/testthat/test-metadata.R
index 21b7ebe11a..4cf8e49af1 100644
--- a/r/tests/testthat/test-metadata.R
+++ b/r/tests/testthat/test-metadata.R
@@ -254,8 +254,6 @@ test_that("Row-level metadata (does not) roundtrip in 
datasets", {
   skip_if_not_available("dataset")
   skip_if_not_available("parquet")
 
-  library(dplyr, warn.conflicts = FALSE)
-
   df <- tibble::tibble(
 metadata = list(
   structure(1, my_value_as_attr = 1),
@@ -269,39 +267,36 @@ test_that("Row-level metadata (does not) roundtrip in 
datasets", {
 
   dst_dir <- make_temp_dir()
 
-  withr::with_options(
-list("arrow.preserve_row_level_metadata" = TRUE),
-{
-  expect_warning(
-write_dataset(df, dst_dir, partitioning = "part"),
-"Row-level metadata is not compatible with datasets and will be 
discarded"
-  )
-
-  # Reset directory as previous write will have created some files and the 
default
-  # behavior is to error on existing
-  dst_dir <- make_temp_dir()
-  # but we need to write a dataset with row-level metadata to make sure 
when
-  # reading ones that have been written with them we warn appropriately
-  fake_func_name <- write_dataset
-  fake_func_name(df, dst_dir, partitioning = "part")
-
-  ds <- open_dataset(dst_dir)
-  expect_warning(
-df_from_ds <- collect(ds),
-"Row-level metadata is not compatible with this operation and has been 
ignored"
-  )
-  expect_equal(
-arrange(df_from_ds, int),
-arrange(df, int),
-ignore_attr = TRUE
-  )
-
-  # however there is *no* warning if we don't select the metadata column
-  expect_warning(
-df_from_ds <- ds %>% select(int) %>% collect(),
-NA
-  )
-}
+  withr::local_options("arrow.preserve_row_level_metadata" = TRUE)
+
+  expect_warning(
+write_dataset(df, dst_dir, partitioning = "part"),
+"Row-level metadata is not compatible with datasets and will be discarded"
+  )
+
+  # Reset directory as previous write will have created some files and the 
default
+  # behavior is to error on existing
+  dst_dir <- make_temp_dir()
+  # but we need to write a dataset with row-level metadata to make sure when
+  # reading ones that have been written with them we warn appropriately
+  fake_func_name <- write_dataset
+  fake_func_name(df, dst_dir, partitioning = "part")
+
+  ds <- open_dataset(dst_dir)
+  expect_warning(
+df_from_ds <- collect(ds),
+"Row-level metadata is not compatible with this operation and has been 
ignored"
+  )
+  expect_equal(
+dplyr::arrange(df_from_ds, int),
+dplyr::arrange(df, int),
+ignore_attr = TRUE
+  )
+
+  # however there is *no* warning if we don't select the metadata column
+  expect_warning(
+df_from_ds <- ds %>% dplyr::select(int) %>% dplyr::collect(),
+NA
   )
 })
 



[arrow] branch master updated: ARROW-15460: [R] Add as.data.frame.Dataset method (#14461)

2022-11-02 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e53978b56 ARROW-15460: [R] Add as.data.frame.Dataset method (#14461)
5e53978b56 is described below

commit 5e53978b56aa13f9c033f2e849cc22f2aed6e2d3
Author: Neal Richardson 
AuthorDate: Wed Nov 2 19:15:40 2022 -0400

ARROW-15460: [R] Add as.data.frame.Dataset method (#14461)

Plus some refactoring and disentangling of compute/collect methods

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/NAMESPACE |  2 ++
 r/R/dataset.R   |  7 -
 r/R/dplyr-collect.R | 57 +++--
 r/R/dplyr-group-by.R| 30 ++
 r/R/dplyr.R |  6 -
 r/R/metadata.R  | 13 +++---
 r/R/table.R | 14 +-
 r/man/as_arrow_table.Rd |  3 +++
 r/man/open_dataset.Rd   |  2 +-
 r/tests/testthat/test-dataset.R |  5 
 r/tests/testthat/test-udf.R |  1 +
 11 files changed, 92 insertions(+), 48 deletions(-)

diff --git a/r/NAMESPACE b/r/NAMESPACE
index 4a0c6ed261..0b18ace9ad 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -29,6 +29,7 @@ S3method(as.character,ArrowDatum)
 S3method(as.character,FileFormat)
 S3method(as.character,FragmentScanOptions)
 S3method(as.data.frame,ArrowTabular)
+S3method(as.data.frame,Dataset)
 S3method(as.data.frame,RecordBatchReader)
 S3method(as.data.frame,Schema)
 S3method(as.data.frame,StructArray)
@@ -47,6 +48,7 @@ S3method(as_arrow_array,data.frame)
 S3method(as_arrow_array,default)
 S3method(as_arrow_array,pyarrow.lib.Array)
 S3method(as_arrow_array,vctrs_list_of)
+S3method(as_arrow_table,Dataset)
 S3method(as_arrow_table,RecordBatch)
 S3method(as_arrow_table,RecordBatchReader)
 S3method(as_arrow_table,Schema)
diff --git a/r/R/dataset.R b/r/R/dataset.R
index 54ac30e56b..78b59ecc24 100644
--- a/r/R/dataset.R
+++ b/r/R/dataset.R
@@ -131,7 +131,7 @@
 #' dir.create(tf)
 #' on.exit(unlink(tf))
 #'
-#' write_dataset(mtcars, tf, partitioning="cyl")
+#' write_dataset(mtcars, tf, partitioning = "cyl")
 #'
 #' # You can specify a directory containing the files for your dataset and
 #' # open_dataset will scan all files in your directory.
@@ -397,6 +397,11 @@ dim.Dataset <- function(x) c(x$num_rows, x$num_cols)
 #' @export
 c.Dataset <- function(...) Dataset$create(list(...))
 
+#' @export
+as.data.frame.Dataset <- function(x, row.names = NULL, optional = FALSE, ...) {
+  collect.Dataset(x)
+}
+
 #' @export
 head.Dataset <- function(x, n = 6L, ...) {
   head(Scanner$create(x), n)
diff --git a/r/R/dplyr-collect.R b/r/R/dplyr-collect.R
index 8bf22728d6..395026ce78 100644
--- a/r/R/dplyr-collect.R
+++ b/r/R/dplyr-collect.R
@@ -19,19 +19,8 @@
 # The following S3 methods are registered on load if dplyr is present
 
 collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
-  tryCatch(
-out <- as_arrow_table(x),
-# n = 4 because we want the error to show up as being from collect()
-# and not augment_io_error_msg()
-error = function(e, call = caller_env(n = 4)) {
-  augment_io_error_msg(e, call, schema = x$.data$schema)
-}
-  )
-
-  if (as_data_frame) {
-out <- as.data.frame(out)
-  }
-  restore_dplyr_features(out, x)
+  out <- compute.arrow_dplyr_query(x)
+  collect.ArrowTabular(out, as_data_frame)
 }
 collect.ArrowTabular <- function(x, as_data_frame = TRUE, ...) {
   if (as_data_frame) {
@@ -40,10 +29,27 @@ collect.ArrowTabular <- function(x, as_data_frame = TRUE, 
...) {
 x
   }
 }
-collect.Dataset <- collect.RecordBatchReader <- function(x, ...) 
dplyr::collect(as_adq(x), ...)
+collect.Dataset <- function(x, as_data_frame = TRUE, ...) {
+  collect.ArrowTabular(compute.Dataset(x), as_data_frame)
+}
+collect.RecordBatchReader <- collect.Dataset
 
-compute.arrow_dplyr_query <- function(x, ...) dplyr::collect(x, as_data_frame 
= FALSE)
 compute.ArrowTabular <- function(x, ...) x
+compute.arrow_dplyr_query <- function(x, ...) {
+  # TODO: should this tryCatch move down into as_arrow_table()?
+  tryCatch(
+as_arrow_table(x),
+# n = 4 because we want the error to show up as being from compute()
+# and not augment_io_error_msg()
+error = function(e, call = caller_env(n = 4)) {
+  # Use a dummy schema() here because the CSV file reader handler is only
+  # valid when you read_csv_arrow() with a schema, but Dataset always has
+  # schema
+  # TODO: clean up this
+  augment_io_error_msg(e, call, schema = schema())
+}
+  )
+}
 compute.Dataset <- compute.RecordBatchReader <- compute.arrow_dplyr_query
 
 pull.Dataset <- function(.data,
@@ -93,27 +99,6 @@ handle_pull_as_vector <- f

[arrow] branch master updated (0e162a5499 -> f29be8020e)

2022-11-02 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 0e162a5499 ARROW-18183: [C++] cpp-micro benchmarks are failing on mac 
arm machine (#14562)
 add f29be8020e ARROW-18203: [R] Refactor to remove unnecessary uses of 
build_expr (#14553)

No new revisions were added by this update.

Summary of changes:
 r/DESCRIPTION   |   2 +
 r/R/arrow-datum.R   |  11 +-
 r/R/compute.R   | 184 --
 r/R/dplyr-datetime-helpers.R|  75 +++---
 r/R/dplyr-eval.R|   2 +-
 r/R/dplyr-funcs-conditional.R   |  55 +++--
 r/R/dplyr-funcs-datetime.R  | 141 +--
 r/R/dplyr-funcs-math.R  |  29 +--
 r/R/{expression.R => dplyr-funcs-simple.R}  | 211 ++--
 r/R/dplyr-funcs-string.R|   8 +
 r/R/dplyr-funcs-type.R  |  47 ++--
 r/R/dplyr-funcs.R   |  24 +-
 r/R/expression.R| 310 +---
 r/R/udf.R   | 200 +++
 r/man/Expression.Rd |   8 +-
 r/man/register_binding.Rd   |  20 +-
 r/man/register_scalar_function.Rd   |   2 +-
 r/tests/testthat/_snaps/{compute.md => udf.md}  |   0
 r/tests/testthat/test-dplyr-funcs-datetime.R|  47 ++--
 r/tests/testthat/{test-compute.R => test-udf.R} |   0
 20 files changed, 514 insertions(+), 862 deletions(-)
 copy r/R/{expression.R => dplyr-funcs-simple.R} (50%)
 create mode 100644 r/R/udf.R
 rename r/tests/testthat/_snaps/{compute.md => udf.md} (100%)
 rename r/tests/testthat/{test-compute.R => test-udf.R} (100%)



[arrow] branch master updated (8066c5e1f2 -> d045fc5d65)

2022-10-31 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 8066c5e1f2 ARROW-13980: [Go] Implement Scalar ApproxEquals (#14543)
 add d045fc5d65 ARROW-17462: [R] Cast scalars to type of field in 
Expression building (#13985)

No new revisions were added by this update.

Summary of changes:
 r/R/compute.R  |   2 +-
 r/R/expression.R   | 125 ++---
 r/tests/testthat/test-dataset-dplyr.R  |   6 +-
 r/tests/testthat/test-dplyr-collapse.R |   4 +-
 r/tests/testthat/test-dplyr-filter.R   |  30 
 r/tests/testthat/test-dplyr-mutate.R   |   2 +-
 r/tests/testthat/test-dplyr-query.R|  87 +++
 r/tests/testthat/test-expression.R |   5 +-
 8 files changed, 228 insertions(+), 33 deletions(-)



[arrow] branch master updated (2e84cb8f24 -> eb45b86fe8)

2022-10-22 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 2e84cb8f24 ARROW-18132: [R] Add deprecation cycle for pull() change 
(#14475)
 add eb45b86fe8 ARROW-18132: [R] Add deprecation cycle for pull() change 
(#14475)

No new revisions were added by this update.

Summary of changes:



[arrow] branch master updated (24c0fce142 -> 3a0ee3f391)

2022-10-20 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 24c0fce142 ARROW-17871: [Go] initial binary arithmetic implementation 
(#14255)
 add 3a0ee3f391 ARROW-17954: [R] Update news for 10.0 (#14337)

No new revisions were added by this update.

Summary of changes:
 r/NEWS.md | 65 +++
 r/R/dplyr-funcs-doc.R |  2 +-
 r/man/acero.Rd|  2 +-
 3 files changed, 67 insertions(+), 2 deletions(-)



[arrow] branch master updated (5f5ea7b0e1 -> cd33544533)

2022-10-17 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 5f5ea7b0e1 ARROW-18078: [Docs][R] Fix broken link in R documentation 
(#14437)
 add cd33544533 ARROW-17849: [R][Docs] Document changes due to C++17 for 
centos-7 users (#14440)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/r.yml  |   1 -
 ci/scripts/r_docker_configure.sh |  15 --
 ci/scripts/r_test.sh |  13 -
 ci/scripts/r_windows_build.sh|  44 +---
 dev/tasks/r/github.packages.yml  |  13 ++---
 r/README.md  |  23 +---
 r/configure  |  10 +++-
 r/tools/nixlibs.R| 110 +++
 r/tools/test-nixlibs.R   |  17 +-
 r/vignettes/developers/setup.Rmd |   8 +--
 r/vignettes/install.Rmd  |  93 -
 11 files changed, 156 insertions(+), 191 deletions(-)



[arrow] branch master updated (f5e592eb5e -> 0b86e40622)

2022-10-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from f5e592eb5e ARROW-15540: [C++] Allow the substrait consumer to accept 
plans with hints and nullable literals (#14402)
 add 0b86e40622 ARROW-18053: [Dev] Fix a bug that merge_arrow_pr.py doesn't 
detect Co-authored-by: (#14416)

No new revisions were added by this update.

Summary of changes:
 dev/archery/archery/utils/lint.py | 1 +
 dev/merge_arrow_pr.py | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)



[arrow] branch master updated (8b8841d4d7 -> f5e592eb5e)

2022-10-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 8b8841d4d7 ARROW-18055: [C++] arrow-dataset-dataset-writer-test still 
times out occassionally (#14428)
 add f5e592eb5e ARROW-15540: [C++] Allow the substrait consumer to accept 
plans with hints and nullable literals (#14402)

No new revisions were added by this update.

Summary of changes:
 .../arrow/engine/substrait/expression_internal.cc  |   3 +-
 .../arrow/engine/substrait/relation_internal.cc|  16 +-
 cpp/src/arrow/engine/substrait/serde.cc|   5 +-
 cpp/src/arrow/engine/substrait/serde.h |   9 +-
 cpp/src/arrow/engine/substrait/serde_test.cc   | 246 +++--
 5 files changed, 199 insertions(+), 80 deletions(-)



[arrow] branch master updated (d809c28508 -> 8b8841d4d7)

2022-10-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from d809c28508 ARROW-17965: [C++] ExecBatch support for ChunkedArray 
values (#14348)
 add 8b8841d4d7 ARROW-18055: [C++] arrow-dataset-dataset-writer-test still 
times out occassionally (#14428)

No new revisions were added by this update.

Summary of changes:
 cpp/src/arrow/dataset/dataset_writer.cc |  5 -
 cpp/src/arrow/util/async_util.cc|  3 +++
 cpp/src/arrow/util/async_util_test.cc   | 23 +--
 3 files changed, 28 insertions(+), 3 deletions(-)



[arrow] branch master updated (99b40926c7 -> f0cf5c2033)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 99b40926c7 ARROW-18058: [Dev][Archery] Remove removed ARROW_JNI 
related code (#14419)
 add f0cf5c2033 ARROW-18062: [R] error in CI jobs for R 3.5 and 3.6 when R 
package being installed (#14424)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs.R   | 12 
 r/R/dplyr-slice.R   | 14 +-
 r/tests/testthat/test-dplyr-slice.R |  4 +++-
 3 files changed, 16 insertions(+), 14 deletions(-)



[arrow] branch master updated (ee1f763084 -> 2f57194fd3)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from ee1f763084 ARROW-15838: [R] Coalesce join keys in full outer join 
(#14286)
 add 2f57194fd3 ARROW-18061: [CI][R] Reduce number of jobs on every commit 
(#14420)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/r.yml | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)



[arrow] branch master updated (81e1fbc1de -> ee1f763084)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 81e1fbc1de ARROW-17665: [R] Document dplyr and compute functionality 
(#14387)
 add ee1f763084 ARROW-15838: [R] Coalesce join keys in full outer join 
(#14286)

No new revisions were added by this update.

Summary of changes:
 r/R/arrowExports.R |  4 +-
 r/R/dplyr-collect.R| 29 +
 r/R/dplyr-join.R   | 89 --
 r/R/query-engine.R |  7 ++-
 r/src/arrowExports.cpp |  8 ++--
 r/src/arrow_types.h|  2 +
 r/src/compute-exec.cpp | 32 +++---
 r/tests/testthat/test-dplyr-join.R | 81 --
 8 files changed, 178 insertions(+), 74 deletions(-)



[arrow] branch master updated (8972ebd812 -> 81e1fbc1de)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 8972ebd812 ARROW-17556: [C++] Unbound scan projection expression leads 
to all fields being loaded (#14264)
 add 81e1fbc1de ARROW-17665: [R] Document dplyr and compute functionality 
(#14387)

No new revisions were added by this update.

Summary of changes:
 r/R/arrow-package.R  |  26 +-
 r/R/dplyr-funcs-datetime.R   | 520 +++
 r/R/dplyr-funcs-doc.R| 104 +++---
 r/R/dplyr-funcs-string.R | 196 +-
 r/R/dplyr-funcs-type.R   |  67 ++--
 r/R/dplyr-funcs.R|   7 +-
 r/R/dplyr-summarize.R|  75 ++--
 r/data-raw/docgen.R  |  18 +-
 r/man/acero.Rd   | 104 +++---
 r/tests/testthat/test-dplyr-filter.R |   4 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R |   4 +-
 11 files changed, 621 insertions(+), 504 deletions(-)



[arrow] branch master updated: ARROW-18057: [R] test for slice functions fail on builds without Datasets capability (#14418)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 82c26c8ebe ARROW-18057: [R] test for slice functions fail on builds 
without Datasets capability (#14418)
82c26c8ebe is described below

commit 82c26c8ebe71def3461365a1c974ee6eccd11a06
Author: Nic Crane 
AuthorDate: Fri Oct 14 18:04:39 2022 +0100

ARROW-18057: [R] test for slice functions fail on builds without Datasets 
capability (#14418)

Authored-by: Nic Crane 
Signed-off-by: Neal Richardson 
---
 r/tests/testthat/test-dplyr-slice.R | 1 +
 1 file changed, 1 insertion(+)

diff --git a/r/tests/testthat/test-dplyr-slice.R 
b/r/tests/testthat/test-dplyr-slice.R
index c12dd97aa4..5b577e0388 100644
--- a/r/tests/testthat/test-dplyr-slice.R
+++ b/r/tests/testthat/test-dplyr-slice.R
@@ -119,6 +119,7 @@ test_that("slice_sample, ungrouped", {
   expect_lte(sampled_n, 2)
 
   # Test with dataset, which matters for the UDF HACK
+  skip_if_not_available("dataset")
   sampled_n <- tab %>%
 InMemoryDataset$create() %>%
 slice_sample(n = 2) %>%



[arrow] branch master updated (31f2a01275 -> 883580883a)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 31f2a01275 MINOR: [R][Docs] Fix the note about to read timestamp with 
timezone column from csv (#14413)
 add 883580883a ARROW-17485: [R] Allow TRUE/FALSE to the compression option 
of `write_feather` (`write_ipc_file`) (#13935)

No new revisions were added by this update.

Summary of changes:
 r/R/feather.R   | 7 ++-
 r/man/write_feather.Rd  | 4 +++-
 r/tests/testthat/test-feather.R | 6 ++
 3 files changed, 15 insertions(+), 2 deletions(-)



[arrow] branch master updated (2cbf489158 -> 31f2a01275)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 2cbf489158 ARROW-12105: [R] Replace vars_select, vars_rename with 
eval_select, eval_rename (#14371)
 add 31f2a01275 MINOR: [R][Docs] Fix the note about to read timestamp with 
timezone column from csv (#14413)

No new revisions were added by this update.

Summary of changes:
 r/R/csv.R | 4 ++--
 r/man/read_delim_arrow.Rd | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)



[arrow] branch master updated (d1a8f4ba19 -> d008c17e24)

2022-10-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from d1a8f4ba19 ARROW-18048: [Dev][Archery][Crossbow] Comment bot waits for 
a while before generate a report (#14412)
 add d008c17e24 ARROW-17737: [R] Groups before conversion to a Table must 
not be restored after `collect()` (#14175)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-collect.R| 13 +++--
 r/R/dplyr.R|  8 +++-
 r/tests/testthat/test-dplyr-group-by.R | 33 +
 3 files changed, 47 insertions(+), 7 deletions(-)



[arrow] branch master updated: ARROW-15602: [R][Docs] Update docs to explain how to read timestamp with timezone columns (#13877)

2022-10-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ef4b4a0ae ARROW-15602: [R][Docs] Update docs to explain how to read 
timestamp with timezone columns (#13877)
7ef4b4a0ae is described below

commit 7ef4b4a0ae0c6c15a45ec439e348e26e1e80523d
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Fri Oct 14 10:00:10 2022 +0900

ARROW-15602: [R][Docs] Update docs to explain how to read timestamp with 
timezone columns (#13877)

If users expect `read_csv_arrow` to behave the same as `readr::read_csv`, 
they will be confused by the presence or absence of a time zone, so adds a note 
is provided in the example.
Adds the same example to the test to verify that the error occurs.

Also update the type description to link to the Arrow type documentation.

Authored-by: SHIMA Tatsuya 
Signed-off-by: Neal Richardson 
---
 r/R/csv.R   | 33 ++---
 r/man/read_delim_arrow.Rd   | 33 ++---
 r/tests/testthat/test-csv.R | 19 +--
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/r/R/csv.R b/r/R/csv.R
index 71e01971f4..7b474c137e 100644
--- a/r/R/csv.R
+++ b/r/R/csv.R
@@ -54,17 +54,17 @@
 #' single string, one character per column, where the characters map to Arrow
 #' types analogously to the `readr` type mapping:
 #'
-#' * "c": `utf8()`
-#' * "i": `int32()`
-#' * "n": `float64()`
-#' * "d": `float64()`
-#' * "l": `bool()`
-#' * "f": `dictionary()`
-#' * "D": `date32()`
-#' * "T": `timestamp(unit = "ns")`
-#' * "t": `time32()` (The `unit` arg is set to the default value `"ms"`)
-#' * "_": `null()`
-#' * "-": `null()`
+#' * "c": [utf8()]
+#' * "i": [int32()]
+#' * "n": [float64()]
+#' * "d": [float64()]
+#' * "l": [bool()]
+#' * "f": [dictionary()]
+#' * "D": [date32()]
+#' * "T": [`timestamp(unit = "ns")`][timestamp()]
+#' * "t": [time32()] (The `unit` arg is set to the default value `"ms"`)
+#' * "_": [null()]
+#' * "-": [null()]
 #' * "?": infer the type from the data
 #'
 #' If you use the compact string representation for `col_types`, you must also
@@ -143,6 +143,17 @@
 #' read_csv_arrow(tf, schema = schema(x = int32(), y = utf8()), skip = 1)
 #' read_csv_arrow(tf, col_types = schema(y = utf8()))
 #' read_csv_arrow(tf, col_types = "ic", col_names = c("x", "y"), skip = 1)
+#'
+#' # Note that if a timestamp column contains time zones, type inference won't 
work,
+#' # whether automatic or via the string "T" `col_types` specification.
+#' # To parse timestamps with time zones, provide a [Schema] to `col_types`
+#' # and specify the time zone in the type object:
+#' tf <- tempfile()
+#' write.csv(data.frame(x = "1970-01-01T12:00:00+12:00"), file = tf, row.names 
= FALSE)
+#' read_csv_arrow(
+#'   tf,
+#'   col_types = schema(x = timestamp(unit = "us", timezone = "UTC"))
+#' )
 read_delim_arrow <- function(file,
  delim = ",",
  quote = '"',
diff --git a/r/man/read_delim_arrow.Rd b/r/man/read_delim_arrow.Rd
index f322c56c17..5b91fc0ec9 100644
--- a/r/man/read_delim_arrow.Rd
+++ b/r/man/read_delim_arrow.Rd
@@ -180,17 +180,17 @@ that \code{readr} uses to the \code{col_types} argument. 
This means you provide
 single string, one character per column, where the characters map to Arrow
 types analogously to the \code{readr} type mapping:
 \itemize{
-\item "c": \code{utf8()}
-\item "i": \code{int32()}
-\item "n": \code{float64()}
-\item "d": \code{float64()}
-\item "l": \code{bool()}
-\item "f": \code{dictionary()}
-\item "D": \code{date32()}
-\item "T": \code{timestamp(unit = "ns")}
-\item "t": \code{time32()} (The \code{unit} arg is set to the default value 
\code{"ms"})
-\item "_": \code{null()}
-\item "-": \code{null()}
+\item "c": \code{\link[=utf8]{utf8()}}
+\item "i": \code{\link[=int32]{int32()}}
+\item "n": \code{\link[=float64]{float64()}}
+\item "d": \code{\link[=float64]{float64()}}
+\item "l": \code{\link[=bool]{bool()}}
+\item "f": \code{\link[=dictionary]{dictionary()}}
+\item "D": \code{\link[=date32]{date32()}}
+\item "T": \code{\link[=timestamp]{timestamp(unit = "ns")}}
+\item "t": \code{\link[=time32]{time32()}} (The \code{unit}

[arrow] branch master updated: ARROW-13766: [R] Add slice_*() methods (#14361)

2022-10-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 80e398623d ARROW-13766: [R] Add slice_*() methods (#14361)
80e398623d is described below

commit 80e398623d956304acaeb3922e367d45ed96ddec
Author: Neal Richardson 
AuthorDate: Thu Oct 13 19:59:32 2022 -0400

ARROW-13766: [R] Add slice_*() methods (#14361)

This PR implements `slice_head,()` `slice_tail()`, `slice_min()`, 
`slice_max()` and `slice_sample()`. `slice_sample()` requires a clever hack 
using a UDF because the `random()` C++ function apparently does not work; see 
ARROW-17974.

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/.lintr|   1 +
 r/DESCRIPTION   |   1 +
 r/NAMESPACE |   3 +
 r/R/array.R |   4 +-
 r/R/arrow-datum.R   |   6 ++
 r/R/arrow-package.R |  27 -
 r/R/dataset-scan.R  |  16 ++-
 r/R/dplyr-funcs-doc.R   |  17 ++--
 r/R/dplyr-funcs-type.R  |   4 +-
 r/R/dplyr-funcs.R   |  14 ++-
 r/R/dplyr-slice.R   | 158 +
 r/R/dplyr.R |  10 ++
 r/R/expression.R|   7 +-
 r/R/record-batch-reader.R   |   5 +
 r/R/util.R  |   3 +-
 r/data-raw/docgen.R |   3 +
 r/man/acero.Rd  |  17 ++--
 r/tests/testthat/test-dplyr-slice.R | 192 
 18 files changed, 464 insertions(+), 24 deletions(-)

diff --git a/r/.lintr b/r/.lintr
index 619339afca..1bd80aff4c 100644
--- a/r/.lintr
+++ b/r/.lintr
@@ -27,5 +27,6 @@ linters: linters_with_defaults(
   )
 exclusions: list(
   "R/arrowExports.R",
+  "R/dplyr-funcs-doc.R",
   "data-raw/codegen.R"
   )
diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 4b526e8b8a..5a69d46896 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -116,6 +116,7 @@ Collate:
 'dplyr-join.R'
 'dplyr-mutate.R'
 'dplyr-select.R'
+'dplyr-slice.R'
 'dplyr-summarize.R'
 'dplyr-union.R'
 'record-batch.R'
diff --git a/r/NAMESPACE b/r/NAMESPACE
index e20e61c0e3..59055ff2b7 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -421,6 +421,8 @@ importFrom(rlang,as_quosure)
 importFrom(rlang,call2)
 importFrom(rlang,call_args)
 importFrom(rlang,caller_env)
+importFrom(rlang,check_dots_empty)
+importFrom(rlang,dots_list)
 importFrom(rlang,dots_n)
 importFrom(rlang,enexpr)
 importFrom(rlang,enexprs)
@@ -472,6 +474,7 @@ importFrom(stats,na.fail)
 importFrom(stats,na.omit)
 importFrom(stats,na.pass)
 importFrom(stats,quantile)
+importFrom(stats,runif)
 importFrom(tidyselect,all_of)
 importFrom(tidyselect,contains)
 importFrom(tidyselect,ends_with)
diff --git a/r/R/array.R b/r/R/array.R
index 7c2fb5c783..c730bd742b 100644
--- a/r/R/array.R
+++ b/r/R/array.R
@@ -349,7 +349,7 @@ stop_cant_convert_array <- function(x, type) {
 "Can't create Array from object of type %s",
 paste(class(x), collapse = " / ")
   ),
-  call = rlang::caller_env()
+  call = caller_env()
 )
   } else {
 abort(
@@ -358,7 +358,7 @@ stop_cant_convert_array <- function(x, type) {
 format(type$code()),
 paste(class(x), collapse = " / ")
   ),
-  call = rlang::caller_env()
+  call = caller_env()
 )
   }
 }
diff --git a/r/R/arrow-datum.R b/r/R/arrow-datum.R
index 33c67a5285..cb3bfa57f6 100644
--- a/r/R/arrow-datum.R
+++ b/r/R/arrow-datum.R
@@ -299,6 +299,9 @@ head.ArrowDatum <- function(x, n = 6L, ...) {
   } else {
 n <- min(len, n)
   }
+  if (!is.integer(n)) {
+n <- floor(n)
+  }
   if (n == len) {
 return(x)
   }
@@ -310,6 +313,9 @@ head.ArrowDatum <- function(x, n = 6L, ...) {
 tail.ArrowDatum <- function(x, n = 6L, ...) {
   assert_is(n, c("numeric", "integer"))
   assert_that(length(n) == 1)
+  if (!is.integer(n)) {
+n <- floor(n)
+  }
   len <- NROW(x)
   if (n < 0) {
 # tail(x, negative) means all but the first n rows
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 143f4c191b..477fa67e7c 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -26,7 +26,7 @@
 #' @importFrom rlang expr caller_env is_character quo_name is_quosure enexpr 
enexprs as_quosure
 #' @importFrom rlang is_list call2 is_empty as_function as_label arg_match 
is_symbol is_call call_args
 #' @importFrom rlang quo_set_env quo_get_env is_formula quo_is_call f_rhs 
parse_expr f_env new_quosure
-#' @importFrom rlang new_quosures expr_text
+#' @importFrom rlang new_quosures expr_text caller_env check_dots_empty 
dots_list
 #' @importFrom tidyselect vars_pull vars_renam

[arrow] branch master updated (66e8ba5a1e -> 959a9d5dee)

2022-10-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 66e8ba5a1e MINOR: [R][Docs] Add note about conversion from JSON types 
to Arrow types (#13871)
 add 959a9d5dee ARROW-17788: [R][Doc] Add example of using Scanner (#14184)

No new revisions were added by this update.

Summary of changes:
 r/R/dataset-scan.R| 24 +++-
 r/R/dataset.R |  5 ++---
 r/man/Scanner.Rd  | 27 ++-
 r/man/open_dataset.Rd |  5 ++---
 4 files changed, 53 insertions(+), 8 deletions(-)



[arrow] branch master updated: MINOR: [R][Docs] Add note about conversion from JSON types to Arrow types (#13871)

2022-10-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 66e8ba5a1e MINOR: [R][Docs] Add note about conversion from JSON types 
to Arrow types (#13871)
66e8ba5a1e is described below

commit 66e8ba5a1e07eaee19f040aa4df5a840614ed790
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Thu Oct 13 22:18:49 2022 +0900

MINOR: [R][Docs] Add note about conversion from JSON types to Arrow types 
(#13871)

Add note about conversion from JSON types to Arrow types.
These documents were copied from `docs/source/python/json.rst` with 
modifications.

Also, show the data frame in the example to make it easier to understand 
how the conversion is performed.

Authored-by: SHIMA Tatsuya 
Signed-off-by: Neal Richardson 
---
 r/R/json.R   | 16 ++--
 r/man/read_json_arrow.Rd | 18 --
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/r/R/json.R b/r/R/json.R
index 2b1f4916cb..c4061f066b 100644
--- a/r/R/json.R
+++ b/r/R/json.R
@@ -21,7 +21,19 @@
 #' data frame or Arrow Table.
 #'
 #' If passed a path, will detect and handle compression from the file extension
-#' (e.g. `.json.gz`). Accepts explicit or implicit nulls.
+#' (e.g. `.json.gz`).
+#'
+#' If `schema` is not provided, Arrow data types are inferred from the data:
+#' - JSON null values convert to the [null()] type, but can fall back to any 
other type.
+#' - JSON booleans convert to [boolean()].
+#' - JSON numbers convert to [int64()], falling back to [float64()] if a 
non-integer is encountered.
+#' - JSON strings of the kind "-MM-DD" and "-MM-DD hh:mm:ss" convert 
to [`timestamp(unit = "s")`][timestamp()],
+#'   falling back to [utf8()] if a conversion error occurs.
+#' - JSON arrays convert to a [list_of()] type, and inference proceeds 
recursively on the JSON arrays' values.
+#' - Nested JSON objects convert to a [struct()] type, and inference proceeds 
recursively on the JSON objects' values.
+#'
+#' When `as_data_frame = FALSE`, Arrow types are further converted to R types.
+#' See `vignette("arrow", package = "arrow")` for details.
 #'
 #' @inheritParams read_delim_arrow
 #' @param schema [Schema] that describes the table.
@@ -37,7 +49,7 @@
 #' { "hello": 3.25, "world": null }
 #' { "hello": 0.0, "world": true, "yo": null }
 #'   ', tf, useBytes = TRUE)
-#' df <- read_json_arrow(tf)
+#' read_json_arrow(tf)
 read_json_arrow <- function(file,
 col_select = NULL,
 as_data_frame = TRUE,
diff --git a/r/man/read_json_arrow.Rd b/r/man/read_json_arrow.Rd
index 2ad600725f..cc821c3301 100644
--- a/r/man/read_json_arrow.Rd
+++ b/r/man/read_json_arrow.Rd
@@ -41,7 +41,21 @@ data frame or Arrow Table.
 }
 \details{
 If passed a path, will detect and handle compression from the file extension
-(e.g. \code{.json.gz}). Accepts explicit or implicit nulls.
+(e.g. \code{.json.gz}).
+
+If \code{schema} is not provided, Arrow data types are inferred from the data:
+\itemize{
+\item JSON null values convert to the \code{\link[=null]{null()}} type, but 
can fall back to any other type.
+\item JSON booleans convert to \code{\link[=boolean]{boolean()}}.
+\item JSON numbers convert to \code{\link[=int64]{int64()}}, falling back to 
\code{\link[=float64]{float64()}} if a non-integer is encountered.
+\item JSON strings of the kind "-MM-DD" and "-MM-DD hh:mm:ss" convert 
to \code{\link[=timestamp]{timestamp(unit = "s")}},
+falling back to \code{\link[=utf8]{utf8()}} if a conversion error occurs.
+\item JSON arrays convert to a \code{\link[=list_of]{list_of()}} type, and 
inference proceeds recursively on the JSON arrays' values.
+\item Nested JSON objects convert to a \code{\link[=struct]{struct()}} type, 
and inference proceeds recursively on the JSON objects' values.
+}
+
+When \code{as_data_frame = FALSE}, Arrow types are further converted to R 
types.
+See \code{vignette("arrow", package = "arrow")} for details.
 }
 \examples{
 \dontshow{if (arrow_with_json()) (if (getRversion() >= "3.4") withAutoprint 
else force)(\{ # examplesIf}
@@ -52,6 +66,6 @@ writeLines('
 { "hello": 3.25, "world": null }
 { "hello": 0.0, "world": true, "yo": null }
   ', tf, useBytes = TRUE)
-df <- read_json_arrow(tf)
+read_json_arrow(tf)
 \dontshow{\}) # examplesIf}
 }



[arrow] branch master updated: MINOR: [R][Docs] Add note about use Schema as the `col_types` argument of `read_csv_arrow` (#13872)

2022-10-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a47cd526d7 MINOR: [R][Docs] Add note about use Schema as the 
`col_types` argument of `read_csv_arrow` (#13872)
a47cd526d7 is described below

commit a47cd526d7cfd28632c0ff92c97b59920f4ebb01
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Thu Oct 13 22:18:30 2022 +0900

MINOR: [R][Docs] Add note about use Schema as the `col_types` argument of 
`read_csv_arrow` (#13872)

Authored-by: SHIMA Tatsuya 
Signed-off-by: Neal Richardson 
---
 r/R/csv.R | 4 ++--
 r/man/read_delim_arrow.Rd | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/r/R/csv.R b/r/R/csv.R
index 4523298416..71e01971f4 100644
--- a/r/R/csv.R
+++ b/r/R/csv.R
@@ -98,8 +98,8 @@
 #' column names and will not be included in the data frame. If `FALSE`, column
 #' names will be generated by Arrow, starting with "f0", "f1", ..., "fN".
 #' Alternatively, you can specify a character vector of column names.
-#' @param col_types A compact string representation of the column types, or
-#' `NULL` (the default) to infer types from the data.
+#' @param col_types A compact string representation of the column types,
+#' an Arrow [Schema], or `NULL` (the default) to infer types from the data.
 #' @param col_select A character vector of column names to keep, as in the
 #' "select" argument to `data.table::fread()`, or a
 #' [tidy selection specification][tidyselect::vars_select()]
diff --git a/r/man/read_delim_arrow.Rd b/r/man/read_delim_arrow.Rd
index 997a7f4101..f322c56c17 100644
--- a/r/man/read_delim_arrow.Rd
+++ b/r/man/read_delim_arrow.Rd
@@ -96,8 +96,8 @@ column names and will not be included in the data frame. If 
\code{FALSE}, column
 names will be generated by Arrow, starting with "f0", "f1", ..., "fN".
 Alternatively, you can specify a character vector of column names.}
 
-\item{col_types}{A compact string representation of the column types, or
-\code{NULL} (the default) to infer types from the data.}
+\item{col_types}{A compact string representation of the column types,
+an Arrow \link{Schema}, or \code{NULL} (the default) to infer types from the 
data.}
 
 \item{col_select}{A character vector of column names to keep, as in the
 "select" argument to \code{data.table::fread()}, or a



[arrow] branch master updated (093a4fe346 -> 20626f833b)

2022-10-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 093a4fe346 ARROW-17971: [Format][Docs] Add ADBC (#14079)
 add 20626f833b ARROW-17439: [R] Change behavior of pull to compute instead 
of collect (#14330)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-collect.R  |  8 --
 r/tests/testthat/test-dataset-write.R|  4 ++-
 r/tests/testthat/test-dataset.R  | 41 
 r/tests/testthat/test-dplyr-arrange.R|  3 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R |  3 +-
 r/tests/testthat/test-dplyr-query.R  |  9 +++---
 6 files changed, 47 insertions(+), 21 deletions(-)



[arrow] branch master updated (e8afe800aa -> fa3cf78e3f)

2022-10-11 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from e8afe800aa ARROW-17988: [C++] Remove index_sequence_for and 
aligned_union backports (#14372)
 add fa3cf78e3f MINOR: [R][CI] Fix typo in docker configure script (#14374)

No new revisions were added by this update.

Summary of changes:
 ci/scripts/r_docker_configure.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[arrow] branch master updated: ARROW-17885: [R] Return BLOB data as list of raw instead of a list of integers (#14277)

2022-10-10 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 73cfd2d0d0 ARROW-17885: [R] Return BLOB data as list of raw instead of 
a list of integers (#14277)
73cfd2d0d0 is described below

commit 73cfd2d0d0e1e5a2192fb73e5262c77953664f81
Author: Dewey Dunnington 
AuthorDate: Mon Oct 10 17:08:34 2022 -0300

ARROW-17885: [R] Return BLOB data as list of raw instead of a list of 
integers (#14277)

This PR adds support for `blob::blob()`, which is common in R database land 
to denote "binary", and `vctrs::list_of()`, which is similar, easy, and helps a 
bit with list of things that happen to be all NULL.

We have our own infrastructure for binary and lists of things too, which I 
assume pre-dates the mature vctrs and blob? Should we consider having 
`as.vector()` output those objects instead of the custom 
`arrow_list/large_list/binary` classes we implement here?

Lead-authored-by: Dewey Dunnington 
Co-authored-by: Dewey Dunnington 
Signed-off-by: Neal Richardson 
---
 r/DESCRIPTION|  1 +
 r/NAMESPACE  |  4 +++
 r/R/array.R  | 20 +
 r/R/type.R   | 14 +
 r/src/r_to_arrow.cpp |  2 +-
 r/src/type_infer.cpp | 29 +++---
 r/tests/testthat/_snaps/Array.md |  8 +
 r/tests/testthat/test-Array.R| 64 +++-
 r/tests/testthat/test-type.R | 32 
 9 files changed, 161 insertions(+), 13 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index cf83f56390..4b526e8b8a 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -45,6 +45,7 @@ RoxygenNote: 7.2.1
 Config/testthat/edition: 3
 VignetteBuilder: knitr
 Suggests:
+blob,
 cli,
 DBI,
 dbplyr,
diff --git a/r/NAMESPACE b/r/NAMESPACE
index 8b08b940b3..24a9e14bb6 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -41,9 +41,11 @@ S3method(as.vector,ArrowDatum)
 S3method(as_arrow_array,Array)
 S3method(as_arrow_array,ChunkedArray)
 S3method(as_arrow_array,Scalar)
+S3method(as_arrow_array,blob)
 S3method(as_arrow_array,data.frame)
 S3method(as_arrow_array,default)
 S3method(as_arrow_array,pyarrow.lib.Array)
+S3method(as_arrow_array,vctrs_list_of)
 S3method(as_arrow_table,RecordBatch)
 S3method(as_arrow_table,RecordBatchReader)
 S3method(as_arrow_table,Table)
@@ -100,7 +102,9 @@ S3method(head,Scanner)
 S3method(head,arrow_dplyr_query)
 S3method(infer_type,ArrowDatum)
 S3method(infer_type,Expression)
+S3method(infer_type,blob)
 S3method(infer_type,default)
+S3method(infer_type,vctrs_list_of)
 S3method(is.finite,ArrowDatum)
 S3method(is.infinite,ArrowDatum)
 S3method(is.na,ArrowDatum)
diff --git a/r/R/array.R b/r/R/array.R
index 938c8e4b04..7c2fb5c783 100644
--- a/r/R/array.R
+++ b/r/R/array.R
@@ -322,6 +322,26 @@ as_arrow_array.data.frame <- function(x, ..., type = NULL) 
{
   }
 }
 
+#' @export
+as_arrow_array.vctrs_list_of <- function(x, ..., type = NULL) {
+  type <- type %||% infer_type(x)
+  if (!inherits(type, "ListType") && !inherits(type, "LargeListType")) {
+stop_cant_convert_array(x, type)
+  }
+
+  as_arrow_array(unclass(x), type = type)
+}
+
+#' @export
+as_arrow_array.blob <- function(x, ..., type = NULL) {
+  type <- type %||% infer_type(x)
+  if (!type$Equals(binary()) && !type$Equals(large_binary())) {
+stop_cant_convert_array(x, type)
+  }
+
+  as_arrow_array(unclass(x), type = type)
+}
+
 stop_cant_convert_array <- function(x, type) {
   if (is.null(type)) {
 abort(
diff --git a/r/R/type.R b/r/R/type.R
index d4d7d52ad5..5089789f6c 100644
--- a/r/R/type.R
+++ b/r/R/type.R
@@ -111,6 +111,20 @@ infer_type.default <- function(x, ..., 
from_array_infer_type = FALSE) {
   }
 }
 
+#' @export
+infer_type.vctrs_list_of <- function(x, ...) {
+  list_of(infer_type(attr(x, "ptype")))
+}
+
+#' @export
+infer_type.blob <- function(x, ...) {
+  if (sum(lengths(x)) > .Machine$integer.max) {
+large_binary()
+  } else {
+binary()
+  }
+}
+
 #' @export
 infer_type.ArrowDatum <- function(x, ...) x$type
 
diff --git a/r/src/r_to_arrow.cpp b/r/src/r_to_arrow.cpp
index aa51799585..c472d8286f 100644
--- a/r/src/r_to_arrow.cpp
+++ b/r/src/r_to_arrow.cpp
@@ -743,7 +743,7 @@ Status check_binary(SEXP x, int64_t size) {
   // check this is a list of raw vectors
   const SEXP* p_x = VECTOR_PTR_RO(x);
   for (R_xlen_t i = 0; i < size; i++, ++p_x) {
-if (TYPEOF(*p_x) != RAWSXP) {
+if (TYPEOF(*p_x) != RAWSXP && (*p_x != R_NilValue)) {
   return Status::Invalid("invalid R type to convert to binary");
 }
   }
diff --git a/r/src/type_infer.cpp b/r/src/type_infer.cpp
index e30d0e1288.

[arrow] branch master updated (7f63ee5033 -> 76d6cbb5c5)

2022-10-10 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 7f63ee5033 ARROW-17976: [C++] Use generic lambdas in arrow/compare.cc 
(#14363)
 add 76d6cbb5c5 ARROW-17594: [R][Packaging] Build binaries with devtoolset 
8 on CentOS 7 (#14243)

No new revisions were added by this update.

Summary of changes:
 ci/docker/centos-7-cpp.dockerfile | 29 
 ci/scripts/r_docker_configure.sh  |  6 +
 dev/tasks/macros.jinja|  7 +++---
 dev/tasks/r/github.packages.yml   | 47 ++-
 dev/tasks/tasks.yml   |  4 +---
 docker-compose.yml|  7 ++
 r/inst/build_arrow_static.sh  |  4 +++-
 r/tools/nixlibs.R |  2 +-
 8 files changed, 69 insertions(+), 37 deletions(-)



[arrow] branch master updated (5aff7a5b76 -> c93a10b3d2)

2022-10-04 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 5aff7a5b76 ARROW-17930: [CI][C++] Valgrind failure in 
PrintValue (#14317)
 add c93a10b3d2 MINOR: [R] Adapt stringr::str_c mapping for upcoming 
release (#14296)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-string.R   |  3 +++
 r/tests/testthat/test-dplyr-funcs-string.R | 10 --
 2 files changed, 7 insertions(+), 6 deletions(-)



[arrow] branch master updated (b7f9dfc2b1 -> 776626e56b)

2022-10-04 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from b7f9dfc2b1 ARROW-16879: [R][CI] Test R GCS bindings with testbench 
(#13542)
 add 776626e56b ARROW-17903: [JS] Update dependencies (#14285)

No new revisions were added by this update.

Summary of changes:
 js/package.json|   42 +-
 js/src/builder/list.ts |2 +-
 js/src/io/adapters.ts  |6 +-
 js/src/util/buffer.ts  |2 +-
 js/yarn.lock   | 2373 +---
 5 files changed, 1251 insertions(+), 1174 deletions(-)



[arrow] branch master updated (4660180848 -> b7f9dfc2b1)

2022-10-04 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 4660180848 ARROW-17450 : [C++][Parquet] Support RLE decode for boolean 
datatype (#14147)
 add b7f9dfc2b1 ARROW-16879: [R][CI] Test R GCS bindings with testbench 
(#13542)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/r.yml   |  13 ++
 ci/scripts/r_test.sh  |   8 -
 r/DESCRIPTION |   1 +
 r/R/filesystem.R  |   9 +-
 r/tests/testthat/helper-filesystems.R | 190 
 r/tests/testthat/helper-skip.R|  15 +-
 r/tests/testthat/test-gcs.R   |  48 +
 r/tests/testthat/test-s3-minio.R  | 329 +-
 8 files changed, 360 insertions(+), 253 deletions(-)
 create mode 100644 r/tests/testthat/helper-filesystems.R



[arrow] branch master updated (2748f3d9fa -> d60d8c6dd4)

2022-09-30 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 2748f3d9fa MINOR: [CI] Use secrets for bucket name in preview-docs job 
(#14270)
 add d60d8c6dd4 ARROW-17848: [R] Skip lubridate::format_ISO8601 tests until 
next release (#14282)

No new revisions were added by this update.

Summary of changes:
 r/tests/testthat/test-dplyr-funcs-datetime.R | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)



[arrow] branch master updated (7a3d801095 -> 7a56846811)

2022-09-27 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 7a3d801095 ARROW-17669: [Go] Take Function kernels for Record batch, 
Tables and Chunked Arrays (#14214)
 add 7a56846811 MINOR: [R] Import the missing `rlang::quo` function (#14091)

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE | 1 +
 r/R/arrow-package.R | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)



[arrow] branch master updated (4f31bfc2ff -> 2577ac1a10)

2022-09-20 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 4f31bfc2ff ARROW-17318: [C++][Dataset] Support async streaming 
interface for getting fragments in Dataset (#13804)
 add 2577ac1a10 ARROW-17690: [R] Implement dplyr::across() inside 
distinct() (#14154)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-funcs-doc.R  |  2 +-
 r/data-raw/docgen.R|  4 ++--
 r/man/acero.Rd |  2 +-
 r/tests/testthat/test-dplyr-distinct.R | 10 ++
 4 files changed, 14 insertions(+), 4 deletions(-)



[arrow] branch master updated (529f653dfa -> 7969164930)

2022-09-19 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 529f653dfa ARROW-17517: [C++] Remove internal headers from substrait 
API (#14131)
 add 7969164930 MINOR: [R] Forward compatibility for tidyselect 1.2 (#14170)

No new revisions were added by this update.

Summary of changes:
 r/tests/testthat/test-dplyr-filter.R | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)



[arrow] branch master updated: MINOR: [R] Fix lint warnings and run styler over everything (#14153)

2022-09-16 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bc2e010d9 MINOR: [R] Fix lint warnings and run styler over everything 
(#14153)
6bc2e010d9 is described below

commit 6bc2e010d9fb4e50d8a9490ec5fa092f2f8783b4
Author: Neal Richardson 
AuthorDate: Fri Sep 16 13:18:38 2022 -0400

MINOR: [R] Fix lint warnings and run styler over everything (#14153)

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/DESCRIPTION|   2 +-
 r/R/arrowExports.R   |   1 -
 r/R/dplyr-datetime-helpers.R |   8 +-
 r/R/dplyr-funcs-doc.R|  28 +++
 r/R/dplyr.R  |   3 +-
 r/data-raw/docgen.R  |  12 +--
 r/man/acero.Rd   |   4 +-
 r/man/show_exec_plan.Rd  |   2 +-
 r/tests/testthat/test-Table.R|   1 -
 r/tests/testthat/test-compute.R  |   2 +-
 r/tests/testthat/test-dataset-dplyr.R|  40 +-
 r/tests/testthat/test-dataset.R  |   4 +-
 r/tests/testthat/test-dplyr-across.R |   1 -
 r/tests/testthat/test-dplyr-funcs-datetime.R | 109 ++-
 r/tests/testthat/test-dplyr-funcs-math.R |   3 +-
 r/tests/testthat/test-dplyr-funcs-string.R   |   3 +-
 r/tests/testthat/test-dplyr-funcs-type.R |   2 +-
 r/tools/winlibs.R|   2 +-
 18 files changed, 100 insertions(+), 127 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 7b60f0c510..90e84d34bc 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -41,7 +41,7 @@ Imports:
 utils,
 vctrs
 Roxygen: list(markdown = TRUE, r6 = FALSE, load = "source")
-RoxygenNote: 7.2.0
+RoxygenNote: 7.2.1
 Config/testthat/edition: 3
 VignetteBuilder: knitr
 Suggests:
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 6e76cd6468..35c73e547c 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -2043,4 +2043,3 @@ SetIOThreadPoolCapacity <- function(threads) {
 Array__infer_type <- function(x) {
   .Call(`_arrow_Array__infer_type`, x)
 }
-
diff --git a/r/R/dplyr-datetime-helpers.R b/r/R/dplyr-datetime-helpers.R
index 4c9a8d1bf0..ba9bb0d543 100644
--- a/r/R/dplyr-datetime-helpers.R
+++ b/r/R/dplyr-datetime-helpers.R
@@ -442,8 +442,10 @@ parse_period_unit <- function(x) {
   str_unit <- substr(x, capture_start[[2]], capture_end[[2]])
   str_multiple <- substr(x, capture_start[[1]], capture_end[[1]])
 
-  known_units <- c("nanosecond", "microsecond", "millisecond", "second",
-   "minute", "hour", "day", "week", "month", "quarter", "year")
+  known_units <- c(
+"nanosecond", "microsecond", "millisecond", "second",
+"minute", "hour", "day", "week", "month", "quarter", "year"
+  )
 
   # match the period unit
   str_unit_start <- substr(str_unit, 1, 3)
@@ -464,7 +466,7 @@ parse_period_unit <- function(x) {
   if (capture_length[[1]] == 0) {
 multiple <- 1L
 
-  # otherwise parse the multiple
+# otherwise parse the multiple
   } else {
 multiple <- as.numeric(str_multiple)
 
diff --git a/r/R/dplyr-funcs-doc.R b/r/R/dplyr-funcs-doc.R
index cac0310f49..cbfe475232 100644
--- a/r/R/dplyr-funcs-doc.R
+++ b/r/R/dplyr-funcs-doc.R
@@ -88,12 +88,12 @@
 #' as `arrow_ascii_is_decimal`.
 #'
 #' ## arrow
-#' 
+#'
 #' * [`add_filename()`][arrow::add_filename()]
 #' * [`cast()`][arrow::cast()]
 #'
 #' ## base
-#' 
+#'
 #' * [`-`][-()]
 #' * [`!`][!()]
 #' * [`!=`][!=()]
@@ -179,13 +179,15 @@
 #' * [`trunc()`][base::trunc()]
 #'
 #' ## bit64
-#' 
+#'
 #' * [`as.integer64()`][bit64::as.integer64()]
 #' * [`is.integer64()`][bit64::is.integer64()]
 #'
 #' ## dplyr
-#' 
-#' * [`across()`][dplyr::across()]: only supported inside `mutate()`, 
`summarize()`, and `arrange()`; purrr-style lambda functions and use of 
`where()` selection helper not yet supported
+#'
+#' * [`across()`][dplyr::across()]: supported inside `mutate()`, 
`summarize()`, `group_by()`, and `arrange()`;
+#' purrr-style lambda functions
+#' and use of `where()` selection helper not yet supported
 #' * [`between()`][dplyr::between()]
 #' * [`case_when()`][dplyr::case_when()]
 #' * [`coalesce()`][dplyr::coalesce()]
@@ -195,7 +197,7 @@
 #' * [`n_distinct()`][dplyr::n_distinct()]
 #'
 #' ## lubridate
-#' 
+#'
 #' * [`am()`][lubridate::am()]
 #' * [`as_date()`][lubridate::as_date()]
 #' * [`as_datetime()`][lubridate::as_datetime()]
@@ -270,11 +272,11 @@
 #' * [`yq()`][lubridate::yq()]
 #'
 #' ## methods
-#' 
+#'
 #' * [`i

[arrow] branch master updated (b48d2287be -> 6926672147)

2022-09-16 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from b48d2287be ARROW-17704: [Java][FlightRPC] Update to Junit 5 (#14103)
 add 6926672147 ARROW-17643: [R] Latest duckdb release is causing test 
failure (#14149)

No new revisions were added by this update.

Summary of changes:
 r/tests/testthat/test-duckdb.R | 4 
 1 file changed, 4 insertions(+)



[arrow] branch master updated (2e72e0a808 -> 93626eebd0)

2022-09-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 2e72e0a808 ARROW-17407: [Doc][FlightRPC] Flight/gRPC best practices 
(#13873)
 add 93626eebd0 ARROW-15011: [R] Generate documentation for dplyr function 
bindings (#14014)

No new revisions were added by this update.

Summary of changes:
 r/DESCRIPTION   |   1 +
 r/Makefile  |   1 +
 r/R/arrow-package.R |  51 +--
 r/R/dplyr-funcs-augmented.R |  19 ++-
 r/R/dplyr-funcs-datetime.R  |  53 ---
 r/R/dplyr-funcs-doc.R   | 332 +++
 r/R/dplyr-funcs-string.R|  86 ++-
 r/R/dplyr-funcs-type.R  |  43 +++---
 r/R/dplyr-funcs.R   |  17 ++-
 r/R/expression.R|  11 +-
 r/_pkgdown.yml  |   1 +
 r/data-raw/docgen.R | 192 +
 r/man/acero.Rd  | 339 
 r/man/add_filename.Rd   |  23 +++
 r/man/cast.Rd   |  38 +
 r/man/register_binding.Rd   |  11 +-
 16 files changed, 1109 insertions(+), 109 deletions(-)
 create mode 100644 r/R/dplyr-funcs-doc.R
 create mode 100644 r/data-raw/docgen.R
 create mode 100644 r/man/acero.Rd
 create mode 100644 r/man/add_filename.Rd
 create mode 100644 r/man/cast.Rd



[arrow] branch master updated (5c773bb922 -> 5c13049d97)

2022-09-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 5c773bb922 ARROW-17673: [R] `desc` in `dplyr::arrange` should allow 
`dplyr::` prefix (#14090)
 add 5c13049d97 ARROW-16190: [CI][R] Implement CI on Apple M1 for R (#14099)

No new revisions were added by this update.

Summary of changes:
 dev/tasks/macros.jinja   |  4 ++--
 dev/tasks/python-wheels/github.osx.arm64.yml | 16 
 dev/tasks/r/github.macos.autobrew.yml|  4 ++--
 dev/tasks/r/github.packages.yml  | 28 +---
 dev/tasks/tasks.yml  |  3 ++-
 dev/tasks/verify-rc/github.macos.arm64.yml   |  2 +-
 6 files changed, 32 insertions(+), 25 deletions(-)



[arrow] branch master updated (d8f64eecf3 -> 5c773bb922)

2022-09-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from d8f64eecf3 ARROW-17172: [C++][Python] test_cython_api fails on windows 
(#14133)
 add 5c773bb922 ARROW-17673: [R] `desc` in `dplyr::arrange` should allow 
`dplyr::` prefix (#14090)

No new revisions were added by this update.

Summary of changes:
 r/R/dplyr-arrange.R   |  2 +-
 r/tests/testthat/test-dplyr-arrange.R | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)



[arrow] branch master updated (05b7fe35cf -> 6c675c3534)

2022-09-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 05b7fe35cf ARROW-17674: [R] Implement dplyr::across() inside arrange() 
(#14092)
 add 6c675c3534 ARROW-15481: [R] [CI] Add a crossbow job that mimics CRAN's 
old macOS (#13925)

No new revisions were added by this update.

Summary of changes:
 dev/tasks/macros.jinja|  3 +++
 dev/tasks/r/github.macos.autobrew.yml |  2 +-
 dev/tasks/r/github.packages.yml   | 34 +++---
 r/tools/autobrew  |  2 +-
 4 files changed, 28 insertions(+), 13 deletions(-)



[arrow] branch master updated: ARROW-17674: [R] Implement dplyr::across() inside arrange() (#14092)

2022-09-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 05b7fe35cf ARROW-17674: [R] Implement dplyr::across() inside arrange() 
(#14092)
05b7fe35cf is described below

commit 05b7fe35cf7c0dbba4d3c86882bb93560e606a13
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Tue Sep 13 00:16:13 2022 +0900

ARROW-17674: [R] Implement dplyr::across() inside arrange() (#14092)

Authored-by: SHIMA Tatsuya 
Signed-off-by: Neal Richardson 
---
 r/R/dplyr-arrange.R   |  3 ++-
 r/tests/testthat/test-dplyr-arrange.R | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/r/R/dplyr-arrange.R b/r/R/dplyr-arrange.R
index 247a539f52..2f9ef61bb3 100644
--- a/r/R/dplyr-arrange.R
+++ b/r/R/dplyr-arrange.R
@@ -20,7 +20,8 @@
 
 arrange.arrow_dplyr_query <- function(.data, ..., .by_group = FALSE) {
   call <- match.call()
-  exprs <- quos(...)
+  exprs <- expand_across(.data, quos(...))
+
   if (.by_group) {
 # when the data is is grouped and .by_group is TRUE, order the result by
 # the grouping columns first
diff --git a/r/tests/testthat/test-dplyr-arrange.R 
b/r/tests/testthat/test-dplyr-arrange.R
index fee1475a44..edec572d10 100644
--- a/r/tests/testthat/test-dplyr-arrange.R
+++ b/r/tests/testthat/test-dplyr-arrange.R
@@ -201,3 +201,18 @@ test_that("arrange() with bad inputs", {
 fixed = TRUE
   )
 })
+
+test_that("Can use across() within arrange()", {
+  compare_dplyr_binding(
+.input %>%
+  arrange(across(starts_with("d"))) %>%
+  collect(),
+example_data
+  )
+  compare_dplyr_binding(
+.input %>%
+  arrange(across(starts_with("d"), desc)) %>%
+  collect(),
+example_data
+  )
+})



[arrow] branch master updated (1b9c57e208 -> 80bba29961)

2022-08-27 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 1b9c57e208 ARROW-17453: [Go][C++][Parquet] Inconsistent Data with 
Repetition Levels (#13982)
 add 80bba29961 ARROW-17463: [R] Avoid unnecessary projections (#13954)

No new revisions were added by this update.

Summary of changes:
 r/R/query-engine.R  | 24 --
 r/tests/testthat/test-dplyr-collapse.R  | 36 +++
 r/tests/testthat/test-dplyr-query.R | 82 +
 r/tests/testthat/test-dplyr-summarize.R | 41 -
 4 files changed, 147 insertions(+), 36 deletions(-)



[arrow] branch master updated: ARROW-15260: [R] open_dataset - add file_name as column (#12826)

2022-08-09 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 838687178f ARROW-15260: [R] open_dataset - add file_name as column 
(#12826)
838687178f is described below

commit 838687178fda7f82e31668f502e2f94071ce8077
Author: Nic Crane 
AuthorDate: Wed Aug 10 01:19:40 2022 +0100

ARROW-15260: [R] open_dataset - add file_name as column (#12826)

Authored-by: Nic Crane 
Signed-off-by: Neal Richardson 
---
 r/DESCRIPTION   |  1 +
 r/R/dataset.R   |  1 +
 r/R/dplyr-collect.R | 11 +
 r/R/dplyr-funcs-augmented.R | 22 ++
 r/R/dplyr-funcs.R   |  1 +
 r/R/dplyr.R |  3 ++
 r/R/util.R  | 31 +-
 r/src/compute-exec.cpp  |  8 ++--
 r/tests/testthat/test-dataset.R | 94 -
 9 files changed, 164 insertions(+), 8 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 308a7ec3fa..95c1405869 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -98,6 +98,7 @@ Collate:
 'dplyr-distinct.R'
 'dplyr-eval.R'
 'dplyr-filter.R'
+'dplyr-funcs-augmented.R'
 'dplyr-funcs-conditional.R'
 'dplyr-funcs-datetime.R'
 'dplyr-funcs-math.R'
diff --git a/r/R/dataset.R b/r/R/dataset.R
index 12765fbfc0..d86962cc1d 100644
--- a/r/R/dataset.R
+++ b/r/R/dataset.R
@@ -224,6 +224,7 @@ open_dataset <- function(sources,
 # and not handle_parquet_io_error()
 error = function(e, call = caller_env(n = 4)) {
   handle_parquet_io_error(e, format, call)
+  abort(conditionMessage(e), call = call)
 }
   )
 }
diff --git a/r/R/dplyr-collect.R b/r/R/dplyr-collect.R
index 3e83475a8c..8049e46eb5 100644
--- a/r/R/dplyr-collect.R
+++ b/r/R/dplyr-collect.R
@@ -25,6 +25,8 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = 
TRUE, ...) {
 # and not handle_csv_read_error()
 error = function(e, call = caller_env(n = 4)) {
   handle_csv_read_error(e, x$.data$schema, call)
+  handle_augmented_field_misuse(e, call)
+  abort(conditionMessage(e), call = call)
 }
   )
 
@@ -104,10 +106,18 @@ add_suffix <- function(fields, common_cols, suffix) {
 }
 
 implicit_schema <- function(.data) {
+  # Get the source data schema so that we can evaluate expressions to determine
+  # the output schema. Note that we don't use source_data() because we only
+  # want to go one level up (where we may have called implicit_schema() before)
   .data <- ensure_group_vars(.data)
   old_schm <- .data$.data$schema
+  # Add in any augmented fields that may exist in the query but not in the
+  # real data, in case we have FieldRefs to them
+  old_schm[["__filename"]] <- string()
 
   if (is.null(.data$aggregations)) {
+# .data$selected_columns is a named list of Expressions (FieldRefs or
+# something more complex). Bind them in order to determine their output 
type
 new_fields <- map(.data$selected_columns, ~ .$type(old_schm))
 if (!is.null(.data$join) && !(.data$join$type %in% JoinType[1:4])) {
   # Add cols from right side, except for semi/anti joins
@@ -128,6 +138,7 @@ implicit_schema <- function(.data) {
   new_fields <- c(left_fields, right_fields)
 }
   } else {
+# The output schema is based on the aggregations and any group_by vars
 new_fields <- map(summarize_projection(.data), ~ .$type(old_schm))
 # * Put group_by_vars first (this can't be done by summarize,
 #   they have to be last per the aggregate node signature,
diff --git a/r/R/dplyr-funcs-augmented.R b/r/R/dplyr-funcs-augmented.R
new file mode 100644
index 00..6e751d49f6
--- /dev/null
+++ b/r/R/dplyr-funcs-augmented.R
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+register_bindings_augmented <- function() {
+  register_binding("add_filename", function() {
+Expression$field_ref("__filename")
+  })
+}
diff --git a/r/R/dplyr-funcs.R b/r/R/dplyr-funcs.R
index c1dcdd1774..4dadff54b4 1

[arrow] branch master updated: ARROW-17252: [R] Intermittent valgrind failure (#13773)

2022-08-09 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 7448322ebe ARROW-17252: [R] Intermittent valgrind failure (#13773)
7448322ebe is described below

commit 7448322ebe34c6efae413a52338ebf7efa1a6069
Author: Dewey Dunnington 
AuthorDate: Tue Aug 9 07:57:57 2022 -0300

ARROW-17252: [R] Intermittent valgrind failure (#13773)

This PR fixes intermittent leaks that occur after one of the changes from 
ARROW-16444: when we drain the `RecordBatchReader` that is emitted from the 
plan too quickly, it seems, some parts of the plan can leak (I don't know why 
this happens).

I tried removing various pieces of the `RunWithCapturedR()` changes (see 
#13746) but the only thing that removes the errors completely is draining the 
resulting `RecordBatchReader` from R (i.e., `reader$read_table()`) instead of 
in C++ (i.e., `reader->ToTable()`). Unfortunately, for user-defined functions 
to work in a plan we need a C++ level `reader->ToTable()`. I took the approach 
here of disabling the C++ level read by default, requiring a user to opt in to 
the version of `collect( [...]

I was able to replicate the original leaks but they are few and far 
between...our tests just happen to create and destroy many, many exec plans and 
something about the CI environment seems to trigger these more reliably 
(although the errors don't always occur there, either). Most of the leaks are 
small but there were some instances where an entire `Table` leaked.

Authored-by: Dewey Dunnington 
Signed-off-by: Neal Richardson 
---
 r/R/compute.R |  9 ++-
 r/R/table.R   |  9 ++-
 r/man/register_scalar_function.Rd |  2 +-
 r/tests/testthat/test-compute.R   | 51 ++-
 4 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/r/R/compute.R b/r/R/compute.R
index 0985e73a5f..636c9146ca 100644
--- a/r/R/compute.R
+++ b/r/R/compute.R
@@ -344,7 +344,7 @@ cast_options <- function(safe = TRUE, ...) {
 #' @return `NULL`, invisibly
 #' @export
 #'
-#' @examplesIf arrow_with_dataset()
+#' @examplesIf arrow_with_dataset() && identical(Sys.getenv("NOT_CRAN"), 
"true")
 #' library(dplyr, warn.conflicts = FALSE)
 #'
 #' some_model <- lm(mpg ~ disp + cyl, data = mtcars)
@@ -385,6 +385,13 @@ register_scalar_function <- function(name, fun, in_type, 
out_type,
 update_cache = TRUE
   )
 
+  # User-defined functions require some special handling
+  # in the query engine which currently require an opt-in using
+  # the R_ARROW_COLLECT_WITH_UDF environment variable while this
+  # behaviour is stabilized.
+  # TODO(ARROW-17178) remove the need for this!
+  Sys.setenv(R_ARROW_COLLECT_WITH_UDF = "true")
+
   invisible(NULL)
 }
 
diff --git a/r/R/table.R b/r/R/table.R
index 5579c676d5..d7e276415c 100644
--- a/r/R/table.R
+++ b/r/R/table.R
@@ -331,5 +331,12 @@ as_arrow_table.arrow_dplyr_query <- function(x, ...) {
   # See query-engine.R for ExecPlan/Nodes
   plan <- ExecPlan$create()
   final_node <- plan$Build(x)
-  plan$Run(final_node, as_table = TRUE)
+
+  run_with_event_loop <- identical(
+Sys.getenv("R_ARROW_COLLECT_WITH_UDF", ""),
+"true"
+  )
+
+  result <- plan$Run(final_node, as_table = run_with_event_loop)
+  as_arrow_table(result)
 }
diff --git a/r/man/register_scalar_function.Rd 
b/r/man/register_scalar_function.Rd
index 4da8f54f64..324dd5fad1 100644
--- a/r/man/register_scalar_function.Rd
+++ b/r/man/register_scalar_function.Rd
@@ -48,7 +48,7 @@ stateless and return output with the same shape (i.e., the 
same number
 of rows) as the input.
 }
 \examples{
-\dontshow{if (arrow_with_dataset()) (if (getRversion() >= "3.4") withAutoprint 
else force)(\{ # examplesIf}
+\dontshow{if (arrow_with_dataset() && identical(Sys.getenv("NOT_CRAN"), 
"true")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
 library(dplyr, warn.conflicts = FALSE)
 
 some_model <- lm(mpg ~ disp + cyl, data = mtcars)
diff --git a/r/tests/testthat/test-compute.R b/r/tests/testthat/test-compute.R
index 9e487169f4..5821c0fa2d 100644
--- a/r/tests/testthat/test-compute.R
+++ b/r/tests/testthat/test-compute.R
@@ -81,6 +81,9 @@ test_that("arrow_scalar_function() works with auto_convert = 
TRUE", {
 
 test_that("register_scalar_function() adds a compute function to the 
registry", {
   skip_if_not(CanRunWithCapturedR())
+  # TODO(ARROW-17178): User-defined function-friendly ExecPlan execution has
+  # occasional valgrind errors
+  skip_on_linux_devel()
 
   register_scalar_function(
 "times_32",
@@ -88,7 +91,11 @@ test_that("register_scalar_function() adds a c

[arrow] branch master updated: ARROW-17088: [R] Use `.arrow` as extension of IPC files of datasets (#13690)

2022-08-02 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8cac69c809 ARROW-17088: [R] Use `.arrow` as extension of IPC files of 
datasets (#13690)
8cac69c809 is described below

commit 8cac69c809e2ae9d4ba9c10c7b22869c1fd11323
Author: mopcup <40266799+mop...@users.noreply.github.com>
AuthorDate: Wed Aug 3 06:35:10 2022 +0900

ARROW-17088: [R] Use `.arrow` as extension of IPC files of datasets (#13690)

Lead-authored-by: mopcup 
Co-authored-by: mopcup <40266799+mop...@users.noreply.github.com>
Signed-off-by: Neal Richardson 
---
 r/R/dataset-write.R   |  8 +--
 r/man/write_dataset.Rd|  5 +++--
 r/tests/testthat/test-dataset-write.R | 42 ---
 3 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/r/R/dataset-write.R b/r/R/dataset-write.R
index 496aaad205..e0181ee74f 100644
--- a/r/R/dataset-write.R
+++ b/r/R/dataset-write.R
@@ -34,8 +34,9 @@
 #' use the current `group_by()` columns.
 #' @param basename_template string template for the names of files to be 
written.
 #' Must contain `"{i}"`, which will be replaced with an autoincremented
-#' integer to generate basenames of datafiles. For example, 
`"part-{i}.feather"`
-#' will yield `"part-0.feather", ...`.
+#' integer to generate basenames of datafiles. For example, `"part-{i}.arrow"`
+#' will yield `"part-0.arrow", ...`.
+#' If not specified, it defaults to `"part-{i}."`.
 #' @param hive_style logical: write partition segments as Hive-style
 #' (`key1=value1/key2=value2/file.ext`) or as just bare values. Default is 
`TRUE`.
 #' @param existing_data_behavior The behavior to use when there is already data
@@ -133,6 +134,9 @@ write_dataset <- function(dataset,
   max_rows_per_group = bitwShiftL(1, 20),
   ...) {
   format <- match.arg(format)
+  if (format %in% c("feather", "ipc")) {
+format <- "arrow"
+  }
   if (inherits(dataset, "arrow_dplyr_query")) {
 # partitioning vars need to be in the `select` schema
 dataset <- ensure_group_vars(dataset)
diff --git a/r/man/write_dataset.Rd b/r/man/write_dataset.Rd
index 8fc07d5cc7..1bc940697c 100644
--- a/r/man/write_dataset.Rd
+++ b/r/man/write_dataset.Rd
@@ -38,8 +38,9 @@ use the current \code{group_by()} columns.}
 
 \item{basename_template}{string template for the names of files to be written.
 Must contain \code{"{i}"}, which will be replaced with an autoincremented
-integer to generate basenames of datafiles. For example, 
\code{"part-{i}.feather"}
-will yield \verb{"part-0.feather", ...}.}
+integer to generate basenames of datafiles. For example, 
\code{"part-{i}.arrow"}
+will yield \verb{"part-0.arrow", ...}.
+If not specified, it defaults to \code{"part-{i}."}.}
 
 \item{hive_style}{logical: write partition segments as Hive-style
 (\code{key1=value1/key2=value2/file.ext}) or as just bare values. Default is 
\code{TRUE}.}
diff --git a/r/tests/testthat/test-dataset-write.R 
b/r/tests/testthat/test-dataset-write.R
index 2f4ff7e649..7a5f861ca5 100644
--- a/r/tests/testthat/test-dataset-write.R
+++ b/r/tests/testthat/test-dataset-write.R
@@ -63,7 +63,7 @@ test_that("Writing a dataset: CSV->IPC", {
 
   # Check whether "int" is present in the files or just in the dirs
   first <- read_feather(
-dir(dst_dir, pattern = ".feather$", recursive = TRUE, full.names = 
TRUE)[1],
+dir(dst_dir, pattern = ".arrow$", recursive = TRUE, full.names = TRUE)[1],
 as_data_frame = FALSE
   )
   # It shouldn't be there
@@ -139,6 +139,40 @@ test_that("Writing a dataset: Parquet->Parquet (default)", 
{
   )
 })
 
+test_that("Writing a dataset: `basename_template` default behavier", {
+  ds <- open_dataset(csv_dir, partitioning = "part", format = "csv")
+
+  dst_dir <- make_temp_dir()
+  write_dataset(ds, dst_dir, format = "parquet", max_rows_per_file = 5L)
+  expect_identical(
+dir(dst_dir, full.names = FALSE, recursive = TRUE),
+paste0("part-", 0:3, ".parquet")
+  )
+  dst_dir <- make_temp_dir()
+  write_dataset(ds, dst_dir, format = "parquet", basename_template = 
"{i}.data", max_rows_per_file = 5L)
+  expect_identical(
+dir(dst_dir, full.names = FALSE, recursive = TRUE),
+paste0(0:3, ".data")
+  )
+  dst_dir <- make_temp_dir()
+  expect_error(
+write_dataset(ds, dst_dir, format = "parquet", basename_template = 
"part-i.parquet"),
+"basename_template did not contain '\\{i\\}'"
+  )
+  fe

[arrow] branch master updated (95aec82bd6 -> cc63a5da02)

2022-07-27 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 95aec82bd6 ARROW-12693: [R] add unique() methods for ArrowTabular, 
datasets (#13641)
 add cc63a5da02 ARROW-16612: [R] Fix compression inference from filename 
(#13625)

No new revisions were added by this update.

Summary of changes:
 r/R/csv.R  | 40 +++-
 r/R/feather.R  | 21 +++
 r/R/io.R   | 76 --
 r/R/ipc-stream.R   | 10 -
 r/R/json.R |  5 +++
 r/R/parquet.R  |  9 +
 r/man/make_readable_file.Rd| 11 +-
 r/man/read_feather.Rd  |  6 +--
 r/man/read_ipc_stream.Rd   |  6 ---
 r/man/write_feather.Rd |  9 +++--
 r/man/write_ipc_stream.Rd  |  6 ---
 r/tests/testthat/test-compressed.R |  8 
 r/tests/testthat/test-csv.R| 25 -
 r/tests/testthat/test-feather.R| 16 
 r/tests/testthat/test-parquet.R| 16 
 15 files changed, 145 insertions(+), 119 deletions(-)



[arrow] branch master updated: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date (#12154)

2022-07-21 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new b0734e68d6 ARROW-14821: [R] Implement bindings for lubridate's 
floor_date, ceiling_date, and round_date (#12154)
b0734e68d6 is described below

commit b0734e68d6f57fb22869df0d0aa2ae4dd75765dc
Author: Danielle Navarro 
AuthorDate: Fri Jul 22 08:31:02 2022 +1000

ARROW-14821: [R] Implement bindings for lubridate's floor_date, 
ceiling_date, and round_date (#12154)

This patch provides dplyr bindings to for lubridate functions 
`floor_date()`, `ceiling_date()`, and `round_date()`. This is my first attempt 
at writing a patch, so my apologies if I've made any errors 

### Supported functionality:

- Allows rounding to integer multiples of common time units (second, 
minutes, days, etc)
- Mirrors the lubridate syntax allowing fractional units such as `unit = 
.001 seconds` as an alias for `unit = 1 millisecond`
- Allows partial matching of date units based on first three characters: 
e.g. `sec`, `second`, `seconds` all match `second`
- Mirrors lubridate in throwing errors when unit exceeds thresholds: 60 
seconds, 60 minutes, 24 hours

~~### Major problems not yet addressed:~~

~~- Does not yet support the `week_start` argument, and implicitly fixes 
`week_start = 4`~~
~~- Does not yet mirror lubridate handling of timezones~~

~~I'd prefer to fix these two issues before merging, but I'm uncertain how 
best to handle them. Any advice would be appreciated!~~

~~### Minor things not yet addressed~~

~~- During rounding lubridate sometimes coerces Date objects to POSIXct. 
This is not mirrored in the arrow bindings: date32 classes remain date32 
classes. This introduces minor differences in rounding in some cases~~
~~- Does not yet support the `change_on_boundary` argument to 
`ceiling_date()`. It's a small discrepancy, but it means that the default 
behaviour of the arrow dplyr binding mirrors lubridate prior to v1.6.0~~

EDIT: issues now addressed!


Authored-by: Danielle Navarro 
Signed-off-by: Neal Richardson 
---
 r/R/dplyr-datetime-helpers.R | 158 
 r/R/dplyr-funcs-datetime.R   |  52 +++
 r/tests/testthat/test-dplyr-funcs-datetime.R | 578 +++
 3 files changed, 788 insertions(+)

diff --git a/r/R/dplyr-datetime-helpers.R b/r/R/dplyr-datetime-helpers.R
index 9199ce0dd5..efcc62ff4e 100644
--- a/r/R/dplyr-datetime-helpers.R
+++ b/r/R/dplyr-datetime-helpers.R
@@ -417,3 +417,161 @@ build_strptime_exprs <- function(x, formats) {
 )
   )
 }
+
+# This function parses the "unit" argument to round_date, floor_date, and
+# ceiling_date. The input x is a single string like "second", "3 seconds",
+# "10 microseconds" or "2 secs" used to specify the size of the unit to
+# which the temporal data should be rounded. The matching rules implemented
+# are designed to mirror lubridate exactly: it extracts the numeric multiple
+# from the start of the string (presumed to be 1 if no number is present)
+# and selects the unit by looking at the first 3 characters only. This choice
+# ensures that "secs", "second", "microsecs" etc are all valid, but it is
+# very permissive and would interpret "mickeys" as microseconds. This
+# permissive implementation mirrors the corresponding implementation in
+# lubridate. The return value is a list with integer-valued components
+# "multiple" and  "unit"
+parse_period_unit <- function(x) {
+  # the regexp matches against fractional units, but per lubridate
+  # supports integer multiples of a known unit only
+  match_info <- regexpr(
+pattern = " *(?[0-9.,]+)? *(?[^ \t\n]+)",
+text = x[[1]],
+perl = TRUE
+  )
+
+  capture_start <- attr(match_info, "capture.start")
+  capture_length <- attr(match_info, "capture.length")
+  capture_end <- capture_start + capture_length - 1L
+
+  str_unit <- substr(x, capture_start[[2]], capture_end[[2]])
+  str_multiple <- substr(x, capture_start[[1]], capture_end[[1]])
+
+  known_units <- c("nanosecond", "microsecond", "millisecond", "second",
+   "minute", "hour", "day", "week", "month", "quarter", "year")
+
+  # match the period unit
+  str_unit_start <- substr(str_unit, 1, 3)
+  unit <- as.integer(pmatch(str_unit_start, known_units)) - 1L
+
+  if (any(is.na(unit))) {
+abort(
+  sprintf(
+"Invalid period name: '%s'",
+str_unit,
+". Known units are",
+oxford_past

[arrow] branch master updated: ARROW-8324: [R] Add read/write_ipc_file separate from _feather (#13626)

2022-07-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d81d8451a0 ARROW-8324: [R] Add read/write_ipc_file separate from 
_feather (#13626)
d81d8451a0 is described below

commit d81d8451a0ff1c5108bc04e727ae053365950551
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Tue Jul 19 05:15:50 2022 +0900

ARROW-8324: [R] Add read/write_ipc_file separate from _feather (#13626)

Add `read_ipc_file()` and `write_ipc_file()` to read and write Arrow IPC 
files (Feather V2).
These are much the same as `read_feather()`/`write_feather()` for now, but 
in the future *_feather functions may move to a different implementation to 
accommodate Feather V1 format.

Authored-by: SHIMA Tatsuya 
Signed-off-by: Neal Richardson 
---
 r/NAMESPACE |  2 ++
 r/NEWS.md   |  7 +-
 r/R/feather.R   | 56 -
 r/man/read_feather.Rd   | 13 +++---
 r/man/write_feather.Rd  | 38 ++--
 r/tests/testthat/test-feather.R | 33 
 r/vignettes/arrow.Rmd   |  3 ++-
 7 files changed, 126 insertions(+), 26 deletions(-)

diff --git a/r/NAMESPACE b/r/NAMESPACE
index c7d2657bae..750a815f9f 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -335,6 +335,7 @@ export(open_dataset)
 export(read_csv_arrow)
 export(read_delim_arrow)
 export(read_feather)
+export(read_ipc_file)
 export(read_ipc_stream)
 export(read_json_arrow)
 export(read_message)
@@ -370,6 +371,7 @@ export(vctrs_extension_type)
 export(write_csv_arrow)
 export(write_dataset)
 export(write_feather)
+export(write_ipc_file)
 export(write_ipc_stream)
 export(write_parquet)
 export(write_to_raw)
diff --git a/r/NEWS.md b/r/NEWS.md
index fca55b047e..59245b971d 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -24,7 +24,12 @@
 * `lubridate::parse_date_time()` datetime parser:
   * `orders` with year, month, day, hours, minutes, and seconds components are 
supported.
   * the `orders` argument in the Arrow binding works as follows: `orders` are 
transformed into `formats` which subsequently get applied in turn. There is no 
`select_formats` parameter and no inference takes place (like is the case in 
`lubridate::parse_date_time()`).
-* `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have 
been removed. Use the `read/write_feather()` and `read/write_ipc_stream()` 
functions depending on whether you're working with the Arrow IPC file or stream 
format, respectively.
+* New functions `read_ipc_file()` and `write_ipc_file()` are added.
+  These functions are almost the same as `read_feather()` and 
`write_feather()`,
+  but differ in that they only target IPC files (Feather V2 files), not 
Feather V1 files.
+* `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have 
been removed.
+  Instead of these, use the `read_ipc_file()` and `write_ipc_file()` for IPC 
files, or,
+  `read_ipc_stream()` and `write_ipc_stream()` for IPC streams.
 * `write_parquet()` now defaults to writing Parquet format version 2.4 (was 
1.0). Previously deprecated arguments `properties` and `arrow_properties` have 
been removed; if you need to deal with these lower-level properties objects 
directly, use `ParquetFileWriter`, which `write_parquet()` wraps.
 
 # arrow 8.0.0
diff --git a/r/R/feather.R b/r/R/feather.R
index 02871396fa..46863c98a1 100644
--- a/r/R/feather.R
+++ b/r/R/feather.R
@@ -15,19 +15,23 @@
 # specific language governing permissions and limitations
 # under the License.
 
-#' Write data in the Feather format
+#' Write a Feather file (an Arrow IPC file)
 #'
 #' Feather provides binary columnar serialization for data frames.
 #' It is designed to make reading and writing data frames efficient,
 #' and to make sharing data across data analysis languages easy.
-#' This function writes both the original, limited specification of the format
-#' and the version 2 specification, which is the Apache Arrow IPC file format.
+#' [write_feather()] can write both the Feather Version 1 (V1),
+#' a legacy version available starting in 2016, and the Version 2 (V2),
+#' which is the Apache Arrow IPC file format.
+#' The default version is V2.
+#' V1 files are distinct from Arrow IPC files and lack many feathures,
+#' such as the ability to store all Arrow data tyeps, and compression support.
+#' [write_ipc_file()] can only write V2 files.
 #'
 #' @param x `data.frame`, [RecordBatch], or [Table]
 #' @param sink A string file path, URI, or [OutputStream], or path in a file
 #' system (`SubTreeFileSystem`)
-#' @param version integer Feather file version. Version 2 is the current.
-#' Version 1 is the more limited legacy format.
+#' @param version integer Feather file version, Version 1 or Ver

[arrow] branch master updated: ARROW-17102: [R] Test fails on R minimal nightly builds due to Parquet writing (#13631)

2022-07-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 72d2d24851 ARROW-17102: [R] Test fails on R minimal nightly builds due 
to Parquet writing (#13631)
72d2d24851 is described below

commit 72d2d248517c0d6f42ef921ed996c92e634e7a81
Author: Nic Crane 
AuthorDate: Mon Jul 18 20:15:56 2022 +0100

ARROW-17102: [R] Test fails on R minimal nightly builds due to Parquet 
writing (#13631)

Authored-by: Nic Crane 
Signed-off-by: Neal Richardson 
---
 r/tests/testthat/test-dplyr-summarize.R | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/r/tests/testthat/test-dplyr-summarize.R 
b/r/tests/testthat/test-dplyr-summarize.R
index 3711b49975..f799fcbf38 100644
--- a/r/tests/testthat/test-dplyr-summarize.R
+++ b/r/tests/testthat/test-dplyr-summarize.R
@@ -237,6 +237,8 @@ test_that("Group by any/all", {
 })
 
 test_that("n_distinct() with many batches", {
+  skip_if_not_available("parquet")
+
   tf <- tempfile()
   write_parquet(dplyr::starwars, tf, chunk_size = 20)
 



[arrow] branch master updated: ARROW-14575: [R] Allow functions with `pkg::` prefixes (#13160)

2022-07-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e0eea1244 ARROW-14575: [R] Allow functions with `pkg::` prefixes 
(#13160)
3e0eea1244 is described below

commit 3e0eea1244a066a6aee3262440093df021c37882
Author: Dragoș Moldovan-Grünfeld 
AuthorDate: Fri Jul 15 22:23:50 2022 +0100

ARROW-14575: [R] Allow functions with `pkg::` prefixes (#13160)

This PR will allow the use of namespacing with bindings:
``` r
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)

test_df <- tibble(
  date = as.Date(c("2022-03-22", "2021-07-30", NA))
)

test_df %>%
  mutate(ddate = lubridate::as_datetime(date)) %>%
  collect()
#> # A tibble: 3 × 2
#>   date   ddate
#>
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA NA

test_df %>%
  arrow_table() %>%
  mutate(ddate = lubridate::as_datetime(date)) %>%
  collect()
#> # A tibble: 3 × 2
#>   date   ddate
#>
#> 1 2022-03-22 2022-03-22 00:00:00
#> 2 2021-07-30 2021-07-30 00:00:00
#> 3 NA NA
```

Created on 2022-05-14 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)

The approach (option 1 from the [design 
doc](https://docs.google.com/document/d/1Om-vYb31b6p_u4tyl86SGW1DrtWBfksq8NYG1Seqaxg/edit#)):

- [x] add functionality to allow binding registration with the `pkg::fun()` 
name;
- [x] Modify `register_binding()` to register 2 identical copies for 
each `pkg::fun` binding, namely `fun` and `pkg::fun`.
- [x] Add a binding for the `::` operator, which helps with retrieving 
bindings from the function registry.
- [x] Add generic unit tests for the `pkg::fun` functionality.
- [x] Warn for a duplicated binding registration.
- [x] register `nse_funcs` requiring _indirect_ mapping
- [x] register each binding with and without the `pkg::` prefix.
- [x] add / update unit tests for the `nse_funcs` bindings to include 
at least one `pkg::fun()` call for each binding

 
 unit tests for conditional bindings

 - [x] `"dplyr::coalesce"`
 - [x] `"dplyr::if_else"`
 - [x] `"base::ifelse"`
 - [x] `"dplyr::case_when"`

 

 
 unit tests for date/time bindings

 - [x] `"base::strptime"`
 - [x] `"base::strftime"`
 - [x] `"lubridate::format_ISO8601"`
 - [x] `"lubridate::is.Date"`
 - [x] `"lubridate::is.instant"`
 - [x] `"lubridate::is.timepoint"`
 - [x] `"lubridate::is.POSIXct"`
 - [x] `"lubridate::date"`
 - [x] `"lubridate::second"`
 - [x] `"lubridate::wday"`
 - [x] `"lubridate::week"`
 - [x] `"lubridate::month"`
 - [x] `"lubridate::am"`
 - [x] `"lubridate::pm"`
 - [x] `"lubridate::tz"`
 - [x] `"lubridate::semester"`
 - [x] `"lubridate::make_datetime"`
 - [x] `"lubridate::make_date"`
 - [x] `"base::ISOdatetime"`
 - [x] `"base::ISOdate"`
 - [x] `"base::as.Date"`
 - [x] `"lubridate::as_date"`
 - [x] `"lubridate::as_datetime"`
 - [x] `"lubridate::decimal_date"`
 - [x] `"lubridate::date_decimal"`
 - [x] `"base::difftime"`
 - [x] `"base::as.difftime"`
 - [x] `"lubridate::make_difftime"`
 - [x] `"lubridate::dminutes"`
 - [x] `"lubridate::dhours"`
 - [x] `"lubridate::ddays"`
 - [x] `"lubridate::dweeks"`
 - [x] `"lubridate::dmonths"`
 - [x] `"lubridate::dyears"`
 - [x] `"lubridate::dseconds"`
 - [x] `"lubridate::dmilliseconds"`
 - [x] `"lubridate::dmicroseconds"`
 - [x] `"lubridate::dnanoseconds"`
 - [x] `"lubridate::dpicoseconds"`
 - [x] `"lubridate::parse_date_time"`
 - [x] `"lubridate::ymd"`
 - [x] `"lubridate::ydm"`
 - [x] `"lubridate::mdy"`
 - [x] `"lubridate::myd"`
 - [x] `"lubridate::dmy"`
 - [x] `"lubridate::dym"`
 - [x] `&

[arrow] branch master updated: ARROW-17085: [R] group_vars() should not return NULL (#13621)

2022-07-15 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 29cc263068 ARROW-17085: [R] group_vars() should not return NULL 
(#13621)
29cc263068 is described below

commit 29cc263068b983e690879d4d768025439a0fdd47
Author: eitsupi <50911393+eits...@users.noreply.github.com>
AuthorDate: Sat Jul 16 01:06:57 2022 +0900

ARROW-17085: [R] group_vars() should not return NULL (#13621)

If an ungrouped data.frame or an `arrow_dplyr_query` is given to 
`dplyr::group_vars()`, `character()` returns.
But for an ungrouped Table, `NULL` is returned.

```r
mtcars |> dplyr::group_vars()
#> character(0)
mtcars |> arrow:::as_adq() |> dplyr::group_vars()
#> character(0)
mtcars |> arrow::arrow_table() |> dplyr::group_vars()
#> NULL
```

Therefore, functions that expect `group_vars` to return character, such as 
the following, will fail.

```r
mtcars |> arrow::arrow_table() |> dtplyr::lazy_dt()
#> Error in new_step(parent, vars = names(parent), groups = groups, locals 
= list(), : is.character(groups) is not TRUE
```

This PR modifies `dplyr::group_vars()` and `dplyr::groups()` for Arrow 
objects to work the same as for data.frame.
(Note that `arrow_dplyr_query` already works the same way as data.frame.)

Lead-authored-by: SHIMA Tatsuya 
Co-authored-by: eitsupi <50911393+eits...@users.noreply.github.com>
Signed-off-by: Neal Richardson 
---
 r/R/dplyr-group-by.R|  8 
 r/R/dplyr.R |  2 +-
 r/tests/testthat/test-RecordBatch.R | 10 --
 r/tests/testthat/test-Table.R   |  8 +++-
 r/tests/testthat/test-metadata.R|  2 +-
 5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/r/R/dplyr-group-by.R b/r/R/dplyr-group-by.R
index 250dbedb18..c650799e8d 100644
--- a/r/R/dplyr-group-by.R
+++ b/r/R/dplyr-group-by.R
@@ -58,13 +58,13 @@ group_by.arrow_dplyr_query <- function(.data,
 group_by.Dataset <- group_by.ArrowTabular <- group_by.RecordBatchReader <- 
group_by.arrow_dplyr_query
 
 groups.arrow_dplyr_query <- function(x) syms(dplyr::group_vars(x))
-groups.Dataset <- groups.ArrowTabular <- groups.RecordBatchReader <- 
function(x) NULL
+groups.Dataset <- groups.ArrowTabular <- groups.RecordBatchReader <- 
groups.arrow_dplyr_query
 
 group_vars.arrow_dplyr_query <- function(x) x$group_by_vars
-group_vars.Dataset <- function(x) NULL
-group_vars.RecordBatchReader <- function(x) NULL
+group_vars.Dataset <- function(x) character()
+group_vars.RecordBatchReader <- function(x) character()
 group_vars.ArrowTabular <- function(x) {
-  x$metadata$r$attributes$.group_vars
+  x$metadata$r$attributes$.group_vars %||% character()
 }
 
 # the logical literal in the two functions below controls the default value of
diff --git a/r/R/dplyr.R b/r/R/dplyr.R
index b048d98018..1296e60384 100644
--- a/r/R/dplyr.R
+++ b/r/R/dplyr.R
@@ -42,7 +42,7 @@ arrow_dplyr_query <- function(.data) {
   gv <- tryCatch(
 # If dplyr is not available, or if the input doesn't have a group_vars
 # method, assume no group vars
-dplyr::group_vars(.data) %||% character(),
+dplyr::group_vars(.data),
 error = function(e) character()
   )
 
diff --git a/r/tests/testthat/test-RecordBatch.R 
b/r/tests/testthat/test-RecordBatch.R
index e7602d9f74..6b79325934 100644
--- a/r/tests/testthat/test-RecordBatch.R
+++ b/r/tests/testthat/test-RecordBatch.R
@@ -654,7 +654,7 @@ test_that("Handling string data with embedded nuls", {
   })
 })
 
-test_that("ARROW-11769/ARROW-13860 - grouping preserved in record batch 
creation", {
+test_that("ARROW-11769/ARROW-13860/ARROW-17085 - grouping preserved in record 
batch creation", {
   skip_if_not_available("dataset")
   library(dplyr, warn.conflicts = FALSE)
 
@@ -670,6 +670,12 @@ test_that("ARROW-11769/ARROW-13860 - grouping preserved in 
record batch creation
   record_batch(),
 "RecordBatch"
   )
+  expect_identical(
+tbl %>%
+  record_batch() %>%
+  group_vars(),
+group_vars(tbl)
+  )
   expect_identical(
 tbl %>%
   group_by(fct, fct2) %>%
@@ -683,7 +689,7 @@ test_that("ARROW-11769/ARROW-13860 - grouping preserved in 
record batch creation
   record_batch() %>%
   ungroup() %>%
   group_vars(),
-NULL
+character()
   )
   expect_identical(
 tbl %>%
diff --git a/r/tests/testthat/test-Table.R b/r/tests/testthat/test-Table.R
index 5edba2cd4a..bafd183108 100644
--- a/r/tests/testthat/test-Table.R
+++ b/r/tests/testthat/test-Table.R
@@ -592,7 +592,7 @@ test_that("cbind.Table handles record batches and tables", {
   )
 })
 
-test_t

[arrow] branch master updated: MINOR: [R] Conditionally skip some glimpse-related tests (#13610)

2022-07-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f295da4cfd MINOR: [R] Conditionally skip some glimpse-related tests 
(#13610)
f295da4cfd is described below

commit f295da4cfdcf102d9ac2d16bbca6f8342fc3e6a8
Author: Neal Richardson 
AuthorDate: Thu Jul 14 19:17:54 2022 -0400

MINOR: [R] Conditionally skip some glimpse-related tests (#13610)

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/tests/testthat/helper-skip.R   | 4 ++--
 r/tests/testthat/test-Array.R| 2 +-
 r/tests/testthat/test-RecordBatch.R  | 2 +-
 r/tests/testthat/test-altrep.R   | 2 +-
 r/tests/testthat/test-chunked-array.R| 2 +-
 r/tests/testthat/test-csv.R  | 2 +-
 r/tests/testthat/test-dplyr-funcs-datetime.R | 2 +-
 r/tests/testthat/test-dplyr-funcs-type.R | 2 +-
 r/tests/testthat/test-dplyr-glimpse.R| 5 +
 r/tests/testthat/test-dplyr-query.R  | 3 +++
 r/tests/testthat/test-feather.R  | 2 +-
 r/tests/testthat/test-safe-call-into-r.R | 4 ++--
 r/tests/testthat/test-scalar.R   | 2 +-
 13 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/r/tests/testthat/helper-skip.R b/r/tests/testthat/helper-skip.R
index 24e5b3f7dc..fd1ce1a76c 100644
--- a/r/tests/testthat/helper-skip.R
+++ b/r/tests/testthat/helper-skip.R
@@ -92,12 +92,12 @@ skip_on_linux_devel <- function() {
   }
 }
 
-skip_if_r_version <- function(r_version) {
+skip_on_r_older_than <- function(r_version) {
   if (force_tests()) {
 return()
   }
 
-  if (getRversion() <= r_version) {
+  if (getRversion() < r_version) {
 skip(paste("R version:", getRversion()))
   }
 }
diff --git a/r/tests/testthat/test-Array.R b/r/tests/testthat/test-Array.R
index ebc6085095..56c7028d6a 100644
--- a/r/tests/testthat/test-Array.R
+++ b/r/tests/testthat/test-Array.R
@@ -785,7 +785,7 @@ test_that("Handling string data with embedded nuls", {
   # The behavior of the warnings/errors is slightly different with and without
   # altrep. Without it (i.e. 3.5.0 and below, the error would trigger 
immediately
   # on `as.vector()` where as with it, the error only happens on 
materialization)
-  skip_if_r_version("3.5.0")
+  skip_on_r_older_than("3.6")
 
   # no error on conversion, because altrep laziness
   v <- expect_error(as.vector(array_with_nul), NA)
diff --git a/r/tests/testthat/test-RecordBatch.R 
b/r/tests/testthat/test-RecordBatch.R
index a39aa0f0fb..e7602d9f74 100644
--- a/r/tests/testthat/test-RecordBatch.R
+++ b/r/tests/testthat/test-RecordBatch.R
@@ -626,7 +626,7 @@ test_that("Handling string data with embedded nuls", {
   # The behavior of the warnings/errors is slightly different with and without
   # altrep. Without it (i.e. 3.5.0 and below, the error would trigger 
immediately
   # on `as.vector()` where as with it, the error only happens on 
materialization)
-  skip_if_r_version("3.5.0")
+  skip_on_r_older_than("3.6")
   df <- as.data.frame(batch_with_nul)
 
   expect_error(
diff --git a/r/tests/testthat/test-altrep.R b/r/tests/testthat/test-altrep.R
index 082a3ea91f..cd1d841c42 100644
--- a/r/tests/testthat/test-altrep.R
+++ b/r/tests/testthat/test-altrep.R
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-skip_if_r_version("3.5.0")
+skip_on_r_older_than("3.6")
 
 test_that("is_arrow_altrep() does not include base altrep", {
   expect_false(is_arrow_altrep(1:10))
diff --git a/r/tests/testthat/test-chunked-array.R 
b/r/tests/testthat/test-chunked-array.R
index 5f32184efc..ce43d84274 100644
--- a/r/tests/testthat/test-chunked-array.R
+++ b/r/tests/testthat/test-chunked-array.R
@@ -478,7 +478,7 @@ test_that("Handling string data with embedded nuls", {
   # The behavior of the warnings/errors is slightly different with and without
   # altrep. Without it (i.e. 3.5.0 and below, the error would trigger 
immediately
   # on `as.vector()` where as with it, the error only happens on 
materialization)
-  skip_if_r_version("3.5.0")
+  skip_on_r_older_than("3.6")
 
   v <- expect_error(as.vector(chunked_array_with_nul), NA)
 
diff --git a/r/tests/testthat/test-csv.R b/r/tests/testthat/test-csv.R
index 8e463d3abe..fca717cc05 100644
--- a/r/tests/testthat/test-csv.R
+++ b/r/tests/testthat/test-csv.R
@@ -295,7 +295,7 @@ test_that("more informative error when reading a CSV with 
headers and schema", {
 test_that("read_csv_arrow() and write_csv_arrow() accept connection objects", {
   # connections with csv need RunWithCapturedR, which is not available
   # in R <= 3.4.4
-  skip_if_r_version("3.4.4")

[arrow] branch master updated (5d86e9fc40 -> 87d1889092)

2022-07-14 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 5d86e9fc40 ARROW-16734: [C++] Bump vendored version of protobuf 
(#13581)
 add 87d1889092 ARROW-16977: [R] Update dataset row counting so no integer 
overflow on large datasets (#13514)

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE   |  1 +
 r/R/arrow-package.R   |  2 +-
 r/R/record-batch.R|  4 ++--
 r/R/util.R|  2 +-
 r/src/array.cpp   | 24 ++---
 r/src/arrowExports.cpp| 50 +--
 r/src/buffer.cpp  |  8 +++
 r/src/chunkedarray.cpp| 19 +---
 r/src/dataset.cpp |  4 ++--
 r/src/filesystem.cpp  |  4 +++-
 r/src/io.cpp  | 21 +-
 r/src/message.cpp |  9 
 r/src/parquet.cpp |  4 ++--
 r/src/recordbatch.cpp |  8 +++
 r/src/table.cpp   |  8 +++
 r/tests/testthat/test-Table.R | 25 ++
 16 files changed, 113 insertions(+), 80 deletions(-)



[arrow] branch master updated: ARROW-16776: [R] dplyr::glimpse method for arrow table and datasets (#13563)

2022-07-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new c6534a59a3 ARROW-16776: [R] dplyr::glimpse method for arrow table and 
datasets (#13563)
c6534a59a3 is described below

commit c6534a59a38acd31856284bcdfa36ecea7d11479
Author: Neal Richardson 
AuthorDate: Tue Jul 12 15:48:16 2022 -0400

ARROW-16776: [R] dplyr::glimpse method for arrow table and datasets (#13563)

See reprex (sans terminal formatting) in 
[r/tests/testthat/_snaps/dplyr-glimpse.md](https://github.com/apache/arrow/pull/13563/files#diff-e8d50da600908f077796a43b7600c17d34448671c7975bb8c4056a484ac2999e)

Not all queries can be glimpse()d: some would require evaluating the whole 
query, which may be expensive (and can't be interrupted yet, see ARROW-11841).

Note that the existing `print()` methods aren't affected by this. There is 
still the idea that the print methods for Table/RecordBatch should print some 
data (ARROW-16777 and others), but that should probably be column-oriented 
instead of row-oriented like glimpse().

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/DESCRIPTION|   3 +
 r/NAMESPACE  |   2 +
 r/R/arrow-object.R   |   6 +-
 r/R/arrow-package.R  |   3 +-
 r/R/chunked-array.R  |   3 +-
 r/R/dplyr-count.R|   2 +-
 r/R/dplyr-glimpse.R  | 160 +++
 r/R/dplyr.R  |  47 -
 r/R/extension.R  |  22 +
 r/R/filesystem.R |   1 -
 r/R/query-engine.R   |   4 +-
 r/tests/testthat/_snaps/dplyr-glimpse.md | 152 +
 r/tests/testthat/test-chunked-array.txt  |   4 +
 r/tests/testthat/test-data-type.R|  19 ++--
 r/tests/testthat/test-dplyr-glimpse.R| 102 
 r/tests/testthat/test-dplyr-query.R  | 140 +++
 r/tests/testthat/test-extension.R|   2 +-
 r/tests/testthat/test-schema.R   |  11 +--
 18 files changed, 637 insertions(+), 46 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 2cbbec054a..a7408d27d6 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -44,6 +44,7 @@ RoxygenNote: 7.2.0
 Config/testthat/edition: 3
 VignetteBuilder: knitr
 Suggests:
+cli,
 DBI,
 dbplyr,
 decor,
@@ -53,6 +54,7 @@ Suggests:
 hms,
 knitr,
 lubridate,
+pillar,
 pkgload,
 reticulate,
 rmarkdown,
@@ -103,6 +105,7 @@ Collate:
 'dplyr-funcs-type.R'
 'expression.R'
 'dplyr-funcs.R'
+'dplyr-glimpse.R'
 'dplyr-group-by.R'
 'dplyr-join.R'
 'dplyr-mutate.R'
diff --git a/r/NAMESPACE b/r/NAMESPACE
index 023e9bb831..86eb958471 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -453,6 +453,8 @@ importFrom(tidyselect,starts_with)
 importFrom(tidyselect,vars_pull)
 importFrom(tidyselect,vars_rename)
 importFrom(tidyselect,vars_select)
+importFrom(utils,capture.output)
+importFrom(utils,getFromNamespace)
 importFrom(utils,head)
 importFrom(utils,install.packages)
 importFrom(utils,modifyList)
diff --git a/r/R/arrow-object.R b/r/R/arrow-object.R
index 0a82f85877..ac067d4aa5 100644
--- a/r/R/arrow-object.R
+++ b/r/R/arrow-object.R
@@ -31,14 +31,16 @@ ArrowObject <- R6Class("ArrowObject",
   }
   assign(".:xp:.", xp, envir = self)
 },
-print = function(...) {
+class_title = function() {
   if (!is.null(self$.class_title)) {
 # Allow subclasses to override just printing the class name first
 class_title <- self$.class_title()
   } else {
 class_title <- class(self)[[1]]
   }
-  cat(class_title, "\n", sep = "")
+},
+print = function(...) {
+  cat(self$class_title(), "\n", sep = "")
   if (!is.null(self$ToString)) {
 cat(self$ToString(), "\n", sep = "")
   }
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 05270ef6bb..a2c37d0ce3 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -41,7 +41,7 @@
   "group_vars", "group_by_drop_default", "ungroup", "mutate", "transmute",
   "arrange", "rename", "pull", "relocate", "compute", "collapse",
   "distinct", "left_join", "right_join", "inner_join", "full_join",
-  "semi_join", "anti_join", "count", "tally", "rename_with", "union", 
"union_all"
+  "semi_join", "anti_jo

[arrow] branch master updated: MINOR: [R] Cleanup skips and TODOs (#13576)

2022-07-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a01b0c20c7 MINOR: [R] Cleanup skips and TODOs (#13576)
a01b0c20c7 is described below

commit a01b0c20c7e2c3283cf195de38372b998dbf17d5
Author: Neal Richardson 
AuthorDate: Tue Jul 12 09:02:40 2022 -0400

MINOR: [R] Cleanup skips and TODOs (#13576)

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/R/array.R  |  6 ---
 r/R/arrow-datum.R| 16 +--
 r/R/chunked-array.R  | 12 --
 r/R/compute.R|  1 -
 r/R/dplyr-datetime-helpers.R | 20 -
 r/R/dplyr-distinct.R |  3 +-
 r/R/dplyr-funcs-datetime.R   |  5 +--
 r/R/dplyr-summarize.R|  1 -
 r/src/altrep.cpp |  1 -
 r/tests/testthat/test-compute-arith.R|  6 +--
 r/tests/testthat/test-compute-sort.R |  4 +-
 r/tests/testthat/test-dplyr-collapse.R   | 12 --
 r/tests/testthat/test-dplyr-distinct.R   |  2 +-
 r/tests/testthat/test-dplyr-filter.R | 10 -
 r/tests/testthat/test-dplyr-funcs-datetime.R | 63 +++-
 r/tests/testthat/test-dplyr-funcs-type.R |  3 +-
 r/tests/testthat/test-dplyr-mutate.R |  2 +-
 r/tests/testthat/test-dplyr-summarize.R  |  2 +-
 r/tools/autobrew |  3 +-
 19 files changed, 76 insertions(+), 96 deletions(-)

diff --git a/r/R/array.R b/r/R/array.R
index 89e9fbfa33..9ae7631e7d 100644
--- a/r/R/array.R
+++ b/r/R/array.R
@@ -155,12 +155,6 @@ Array <- R6Class("Array",
   assert_is(i, "Array")
   call_function("filter", self, i, options = list(keep_na = keep_na))
 },
-SortIndices = function(descending = FALSE) {
-  assert_that(is.logical(descending))
-  assert_that(length(descending) == 1L)
-  assert_that(!is.na(descending))
-  call_function("array_sort_indices", self, options = list(order = 
descending))
-},
 RangeEquals = function(other, start_idx, end_idx, other_start_idx = 0L) {
   assert_is(other, "Array")
   Array__RangeEquals(self, other, start_idx, end_idx, other_start_idx)
diff --git a/r/R/arrow-datum.R b/r/R/arrow-datum.R
index 39362628bb..8632ca3053 100644
--- a/r/R/arrow-datum.R
+++ b/r/R/arrow-datum.R
@@ -26,6 +26,16 @@ ArrowDatum <- R6Class("ArrowDatum",
   opts <- cast_options(safe, ...)
   opts$to_type <- as_type(target_type)
   call_function("cast", self, options = opts)
+},
+SortIndices = function(descending = FALSE) {
+  assert_that(is.logical(descending))
+  assert_that(length(descending) == 1L)
+  assert_that(!is.na(descending))
+  call_function(
+"sort_indices",
+self,
+options = list(names = "", orders = as.integer(descending))
+  )
 }
   )
 )
@@ -55,8 +65,8 @@ is.na.ArrowDatum <- function(x) {
 #' @export
 is.nan.ArrowDatum <- function(x) {
   if (x$type_id() %in% TYPES_WITH_NAN) {
-# TODO: if an option is added to the is_nan kernel to treat NA as NaN,
-# use that to simplify the code here (ARROW-13366)
+# TODO(ARROW-13366): if an option is added to the is_nan kernel to treat NA
+# as NaN, use that to simplify the code here
 call_function("is_nan", x) & call_function("is_valid", x)
   } else {
 Scalar$create(FALSE)$as_array(length(x))
@@ -336,7 +346,7 @@ sort.ArrowDatum <- function(x, decreasing = FALSE, na.last 
= NA, ...) {
   # Arrow always sorts nulls at the end of the array. This corresponds to
   # sort(na.last = TRUE). For the other two cases (na.last = NA and
   # na.last = FALSE) we need to use workarounds.
-  # TODO: Implement this more cleanly after ARROW-12063
+  # TODO(ARROW-14085): use NullPlacement ArraySortOptions instead of this 
workaround
   if (is.na(na.last)) {
 # Filter out NAs before sorting
 x <- x$Filter(!is.na(x))
diff --git a/r/R/chunked-array.R b/r/R/chunked-array.R
index 24ca7e6e58..c16f562017 100644
--- a/r/R/chunked-array.R
+++ b/r/R/chunked-array.R
@@ -113,18 +113,6 @@ ChunkedArray <- R6Class("ChunkedArray",
   }
   call_function("filter", self, i, options = list(keep_na = keep_na))
 },
-SortIndices = function(descending = FALSE) {
-  assert_that(is.logical(descending))
-  assert_that(length(descending) == 1L)
-  assert_that(!is.na(descending))
-  # TODO: after ARROW-12042 is closed, review whether this and the
-  # Array$SortIndices definition can be consolidated
-  call_function(
-"sort_indices",
-self,
-opti

[arrow] branch master updated: ARROW-16715: [R] Bump default parquet version (#13555)

2022-07-11 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new f0ff8d015a ARROW-16715: [R] Bump default parquet version (#13555)
f0ff8d015a is described below

commit f0ff8d015a26a780426a13b556d9db082daed200
Author: Neal Richardson 
AuthorDate: Mon Jul 11 11:26:51 2022 -0400

ARROW-16715: [R] Bump default parquet version (#13555)

Also removes deprecated args to `write_parquet()`

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/NAMESPACE  |  1 +
 r/NEWS.md|  1 +
 r/R/arrow-package.R  |  2 +-
 r/R/enums.R  |  2 +-
 r/R/parquet.R| 99 
 r/man/enums.Rd   |  2 +-
 r/man/write_parquet.Rd   | 48 
 r/tests/testthat/_snaps/dataset-write.md |  2 +-
 r/tests/testthat/test-parquet.R  | 52 ++---
 9 files changed, 122 insertions(+), 87 deletions(-)

diff --git a/r/NAMESPACE b/r/NAMESPACE
index 5762df9eb0..023e9bb831 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -395,6 +395,7 @@ importFrom(rlang,"%||%")
 importFrom(rlang,":=")
 importFrom(rlang,.data)
 importFrom(rlang,abort)
+importFrom(rlang,arg_match)
 importFrom(rlang,as_function)
 importFrom(rlang,as_label)
 importFrom(rlang,as_quosure)
diff --git a/r/NEWS.md b/r/NEWS.md
index 119974f74a..fca55b047e 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -25,6 +25,7 @@
   * `orders` with year, month, day, hours, minutes, and seconds components are 
supported.
   * the `orders` argument in the Arrow binding works as follows: `orders` are 
transformed into `formats` which subsequently get applied in turn. There is no 
`select_formats` parameter and no inference takes place (like is the case in 
`lubridate::parse_date_time()`).
 * `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have 
been removed. Use the `read/write_feather()` and `read/write_ipc_stream()` 
functions depending on whether you're working with the Arrow IPC file or stream 
format, respectively.
+* `write_parquet()` now defaults to writing Parquet format version 2.4 (was 
1.0). Previously deprecated arguments `properties` and `arrow_properties` have 
been removed; if you need to deal with these lower-level properties objects 
directly, use `ParquetFileWriter`, which `write_parquet()` wraps.
 
 # arrow 8.0.0
 
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index 7b59854f1e..05270ef6bb 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -23,7 +23,7 @@
 #' @importFrom rlang eval_tidy new_data_mask syms env new_environment env_bind 
set_names exec
 #' @importFrom rlang is_bare_character quo_get_expr quo_get_env quo_set_expr 
.data seq2 is_interactive
 #' @importFrom rlang expr caller_env is_character quo_name is_quosure enexpr 
enexprs as_quosure
-#' @importFrom rlang is_list call2 is_empty as_function as_label
+#' @importFrom rlang is_list call2 is_empty as_function as_label arg_match
 #' @importFrom tidyselect vars_pull vars_rename vars_select eval_select
 #' @useDynLib arrow, .registration = TRUE
 #' @keywords internal
diff --git a/r/R/enums.R b/r/R/enums.R
index 17d0484b99..727ca9388c 100644
--- a/r/R/enums.R
+++ b/r/R/enums.R
@@ -122,7 +122,7 @@ FileType <- enum("FileType",
 #' @export
 #' @rdname enums
 ParquetVersionType <- enum("ParquetVersionType",
-  PARQUET_1_0 = 0L, PARQUET_2_0 = 1L
+  PARQUET_1_0 = 0L, PARQUET_2_0 = 1L, PARQUET_2_4 = 2L, PARQUET_2_6 = 3L
 )
 
 #' @export
diff --git a/r/R/parquet.R b/r/R/parquet.R
index 62da28fd1e..8cd9daa857 100644
--- a/r/R/parquet.R
+++ b/r/R/parquet.R
@@ -83,30 +83,29 @@ read_parquet <- function(file,
 #' @param sink A string file path, URI, or [OutputStream], or path in a file
 #' system (`SubTreeFileSystem`)
 #' @param chunk_size how many rows of data to write to disk at once. This
-#' directly corresponds to how many rows will be in each row group in parquet.
-#' If `NULL`, a best guess will be made for optimal size (based on the number 
of
-#'  columns and number of rows), though if the data has fewer than 250 million
-#'  cells (rows x cols), then the total number of rows is used.
-#' @param version parquet version, "1.0" or "2.0". Default "1.0". Numeric 
values
-#'   are coerced to character.
+#'directly corresponds to how many rows will be in each row group in
+#'parquet. If `NULL`, a best guess will be made for optimal size (based on
+#'the number of columns and number of rows), though if the data has fewer
+#'than 250 million cells (rows x cols), then the total number of rows is
+#'used.
+#' @param version parquet version: "1.0", "2.0" (de

[arrow] branch master updated (fdcf63a1ed -> 8042f001fb)

2022-07-08 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from fdcf63a1ed ARROW-16828: [R][Packaging] Enable Brotli and BZ2 on MacOS 
and Windows (#13484)
 add 8042f001fb ARROW-16405: [R][CI] Use nightlies.apache.org as dev repo 
(#13241)

No new revisions were added by this update.

Summary of changes:
 docs/source/developers/guide/resources.rst |  2 +-
 r/NEWS.md  |  4 +++-
 r/R/install-arrow.R|  2 +-
 r/README.md|  4 ++--
 r/tools/nixlibs.R  |  2 +-
 r/tools/winlibs.R  |  2 +-
 r/vignettes/developers/install_details.Rmd | 15 ---
 r/vignettes/developers/setup.Rmd   | 21 +++--
 r/vignettes/install.Rmd|  2 +-
 9 files changed, 25 insertions(+), 29 deletions(-)



[arrow] branch master updated: ARROW-16828: [R][Packaging] Enable Brotli and BZ2 on MacOS and Windows (#13484)

2022-07-08 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new fdcf63a1ed ARROW-16828: [R][Packaging] Enable Brotli and BZ2 on MacOS 
and Windows (#13484)
fdcf63a1ed is described below

commit fdcf63a1ed94a17a0f05ed78a82d8af730f048a4
Author: Will Jones 
AuthorDate: Fri Jul 8 10:46:01 2022 -0700

ARROW-16828: [R][Packaging] Enable Brotli and BZ2 on MacOS and Windows 
(#13484)

MacOS was missing Brotli and BZ2. Windows was missing BZ2. After this, 
MacOS and Windows will have all compressions shipped in binaries.

Authored-by: Will Jones 
Signed-off-by: Neal Richardson 
---
 ci/scripts/PKGBUILD  | 2 ++
 ci/scripts/r_windows_build.sh| 6 +++---
 dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb | 3 +++
 r/configure.win  | 2 +-
 r/tests/testthat/test-compressed.R   | 2 ++
 r/tools/autobrew | 2 +-
 6 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/ci/scripts/PKGBUILD b/ci/scripts/PKGBUILD
index ea17fba17e..428447d263 100644
--- a/ci/scripts/PKGBUILD
+++ b/ci/scripts/PKGBUILD
@@ -25,6 +25,7 @@ arch=("any")
 url="https://arrow.apache.org/;
 license=("Apache-2.0")
 depends=("${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp"
+ "${MINGW_PACKAGE_PREFIX}-bzip2"
  "${MINGW_PACKAGE_PREFIX}-curl" # for google-cloud-cpp bundled build
  "${MINGW_PACKAGE_PREFIX}-libutf8proc"
  "${MINGW_PACKAGE_PREFIX}-re2"
@@ -123,6 +124,7 @@ build() {
 -DARROW_WITH_ZLIB=ON \
 -DARROW_WITH_ZSTD=ON \
 -DARROW_WITH_BROTLI=ON \
+-DARROW_WITH_BZ2=ON \
 -DARROW_ZSTD_USE_SHARED=OFF \
 -DARROW_CXXFLAGS="${CPPFLAGS}" \
 -DCMAKE_BUILD_TYPE="release" \
diff --git a/ci/scripts/r_windows_build.sh b/ci/scripts/r_windows_build.sh
index 3334eab866..c361af1d26 100755
--- a/ci/scripts/r_windows_build.sh
+++ b/ci/scripts/r_windows_build.sh
@@ -87,7 +87,7 @@ if [ -d mingw64/lib/ ]; then
   # These may be from https://dl.bintray.com/rtools/backports/
   cp $MSYS_LIB_DIR/mingw64/lib/lib{thrift,snappy}.a 
$DST_DIR/${RWINLIB_LIB_DIR}/x64
   # These are from https://dl.bintray.com/rtools/mingw{32,64}/
-  cp 
$MSYS_LIB_DIR/mingw64/lib/lib{zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/x64
+  cp 
$MSYS_LIB_DIR/mingw64/lib/lib{zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/x64
 fi
 
 # Same for the 32-bit versions
@@ -97,7 +97,7 @@ if [ -d mingw32/lib/ ]; then
   mkdir -p $DST_DIR/lib/i386
   mv mingw32/lib/*.a $DST_DIR/${RWINLIB_LIB_DIR}/i386
   cp $MSYS_LIB_DIR/mingw32/lib/lib{thrift,snappy}.a 
$DST_DIR/${RWINLIB_LIB_DIR}/i386
-  cp 
$MSYS_LIB_DIR/mingw32/lib/lib{zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/i386
+  cp 
$MSYS_LIB_DIR/mingw32/lib/lib{zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/i386
 fi
 
 # Do the same also for ucrt64
@@ -105,7 +105,7 @@ if [ -d ucrt64/lib/ ]; then
   ls $MSYS_LIB_DIR/ucrt64/lib/
   mkdir -p $DST_DIR/lib/x64-ucrt
   mv ucrt64/lib/*.a $DST_DIR/lib/x64-ucrt
-  cp 
$MSYS_LIB_DIR/ucrt64/lib/lib{thrift,snappy,zstd,lz4,brotli*,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/x64-ucrt
+  cp 
$MSYS_LIB_DIR/ucrt64/lib/lib{thrift,snappy,zstd,lz4,brotli*,bz2,crypto,curl,ss*,utf8proc,re2,aws*}.a
 $DST_DIR/lib/x64-ucrt
 fi
 
 # Create build artifact
diff --git a/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb 
b/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb
index 45c04463b6..dde994ab43 100644
--- a/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb
+++ b/dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb
@@ -31,6 +31,7 @@ class ApacheArrow < Formula
 
   # NOTE: if you add something here, be sure to add to PKG_LIBS in 
r/tools/autobrew
   depends_on "boost" => :build
+  depends_on "brotli"
   depends_on "cmake" => :build
   depends_on "aws-sdk-cpp"
   depends_on "lz4"
@@ -57,6 +58,8 @@ class ApacheArrow < Formula
   -DARROW_S3=ON
   -DARROW_USE_GLOG=OFF
   -DARROW_VERBOSE_THIRDPARTY_BUILD=ON
+  -DARROW_WITH_BROTLI=ON
+  -DARROW_WITH_BZ2=ON
   -DARROW_WITH_LZ4=ON
   -DARROW_WITH_SNAPPY=ON
   -DARROW_WITH_ZLIB=ON
diff --git a/r/configure.win b/r/configure.win
index dfd2c87ab4..7aa7e47fc1 100755
--- a/r/configure.win
+++ b/r/configure.win
@@ -64,7 +64,7 @@ function configure_release() {
   PKG_LIBS="-L${RWINLIB}/lib"'$(subst gcc,,$(COMPILED_BY))$(R_ARCH) '
   PKG_LIBS="$PKG_LIBS -L${RWINLIB}/lib"'$(R_ARCH)$(CRT) '
   PKG_LIBS="$PKG_LIBS -lparquet -larrow_dataset -larrow 
-larrow_bundled_depend

[arrow] branch master updated: ARROW-16268: [R] Remove long-deprecated functions (#13550)

2022-07-08 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a48c09e6aa ARROW-16268: [R] Remove long-deprecated functions (#13550)
a48c09e6aa is described below

commit a48c09e6aa3180512354ef9c1ded2f479d09c25e
Author: Neal Richardson 
AuthorDate: Fri Jul 8 13:27:26 2022 -0400

ARROW-16268: [R] Remove long-deprecated functions (#13550)

Also has a fix for the check NOTE about union_all and distinct.

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/DESCRIPTION  |  3 +--
 r/NAMESPACE|  4 +--
 r/NEWS.md  | 11 
 r/R/dataset-scan.R | 18 -
 r/R/deprecated.R   | 40 
 r/R/dplyr-union.R  |  2 +-
 r/man/ArrayData.Rd |  6 +++--
 r/man/FileSystem.Rd|  1 +
 r/man/Scalar.Rd|  6 +++--
 r/man/Scanner.Rd   |  3 ---
 r/man/array.Rd |  6 +++--
 r/man/arrow-package.Rd |  2 +-
 r/man/arrow_info.Rd|  3 +++
 r/man/read_ipc_stream.Rd   | 11 +++-
 r/man/write_ipc_stream.Rd  |  7 ++---
 r/tests/testthat/test-Table.R  | 53 +++---
 r/tests/testthat/test-arrow-info.R |  4 +++
 r/tests/testthat/test-dataset.R| 18 -
 r/tests/testthat/test-type.R   |  9 +++
 19 files changed, 56 insertions(+), 151 deletions(-)

diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 5385877696..2cbbec054a 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -40,7 +40,7 @@ Imports:
 utils,
 vctrs
 Roxygen: list(markdown = TRUE, r6 = FALSE, load = "source")
-RoxygenNote: 7.1.2
+RoxygenNote: 7.2.0
 Config/testthat/edition: 3
 VignetteBuilder: knitr
 Suggests:
@@ -88,7 +88,6 @@ Collate:
 'dataset-partition.R'
 'dataset-scan.R'
 'dataset-write.R'
-'deprecated.R'
 'dictionary.R'
 'dplyr-arrange.R'
 'dplyr-collect.R'
diff --git a/r/NAMESPACE b/r/NAMESPACE
index e98cdd51fb..5762df9eb0 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -195,6 +195,7 @@ export(FileType)
 export(FixedSizeListArray)
 export(FixedSizeListType)
 export(FragmentScanOptions)
+export(GcsFileSystem)
 export(HivePartitioning)
 export(HivePartitioningFactory)
 export(InMemoryDataset)
@@ -251,6 +252,7 @@ export(arrow_available)
 export(arrow_info)
 export(arrow_table)
 export(arrow_with_dataset)
+export(arrow_with_gcs)
 export(arrow_with_json)
 export(arrow_with_parquet)
 export(arrow_with_s3)
@@ -330,7 +332,6 @@ export(null)
 export(num_range)
 export(one_of)
 export(open_dataset)
-export(read_arrow)
 export(read_csv_arrow)
 export(read_delim_arrow)
 export(read_feather)
@@ -366,7 +367,6 @@ export(utf8)
 export(value_counts)
 export(vctrs_extension_array)
 export(vctrs_extension_type)
-export(write_arrow)
 export(write_csv_arrow)
 export(write_dataset)
 export(write_feather)
diff --git a/r/NEWS.md b/r/NEWS.md
index d88be22964..45a963ca48 100644
--- a/r/NEWS.md
+++ b/r/NEWS.md
@@ -22,6 +22,7 @@
 * `lubridate::parse_date_time()` datetime parser:
   * `orders` with year, month, day, hours, minutes, and seconds components are 
supported.
   * the `orders` argument in the Arrow binding works as follows: `orders` are 
transformed into `formats` which subsequently get applied in turn. There is no 
`select_formats` parameter and no inference takes place (like is the case in 
`lubridate::parse_date_time()`).
+* `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have 
been removed. Use the `read/write_feather()` and `read/write_ipc_stream()` 
functions depending on whether you're working with the Arrow IPC file or stream 
format, respectively.
 
 # arrow 8.0.0
 
@@ -50,7 +51,7 @@
 
 ## Enhancements to date and time support
 
-* `read_csv_arrow()`'s readr-style type `T` is mapped to `timestamp(unit = 
"ns")` 
+* `read_csv_arrow()`'s readr-style type `T` is mapped to `timestamp(unit = 
"ns")`
   instead of `timestamp(unit = "s")`.
 * For Arrow dplyr queries, added additional `{lubridate}` features and fixes:
   * New component extraction functions:
@@ -86,14 +87,14 @@
   record batches, arrays, chunked arrays, record batch readers, schemas, and
   data types. This allows other packages to define custom conversions from 
their
   types to Arrow objects, including extension arrays.
-* Custom [extension types and 
arrays](https://arrow.apache.org/docs/format/Columnar.html#extension-types) 
+* Custom [extension types and 
arrays](https://arrow.apache.org/docs/format/Columnar.html#extension-types)
   can be created and registered, allowing other packages to
   define their own array types. Extension arrays wrap regular Arrow array 
types and
   pr

[arrow] branch master updated: MINOR: [R][CI] Add all available package versions to PACKAGES (#13551)

2022-07-08 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 0fdb9cc08b MINOR: [R][CI] Add all available package versions to 
PACKAGES (#13551)
0fdb9cc08b is described below

commit 0fdb9cc08be53ff374d45af109d5ce2d6bb29a82
Author: Jacob Wujciak-Jens 
AuthorDate: Fri Jul 8 16:42:38 2022 +0200

MINOR: [R][CI] Add all available package versions to PACKAGES (#13551)

This overrides the default `latestOnly = TRUE` so all available R package 
versions are added to the repository index.

Authored-by: Jacob Wujciak-Jens 
Signed-off-by: Neal Richardson 
---
 .github/workflows/r_nightly.yml | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/r_nightly.yml b/.github/workflows/r_nightly.yml
index fc93dde017..a47f69136f 100644
--- a/.github/workflows/r_nightly.yml
+++ b/.github/workflows/r_nightly.yml
@@ -158,7 +158,11 @@ jobs:
 run: |
   # folder that we sync to nightlies.apache.org
   repo_root <- "repo"
-  tools::write_PACKAGES(file.path(repo_root, "src/contrib"), type = 
"source", verbose = TRUE)
+  tools::write_PACKAGES(file.path(repo_root, "src/contrib"),
+type = "source",
+verbose = TRUE,
+latestOnly = FALSE
+  )
 
   repo_dirs <- list.dirs(repo_root)
   # find dirs with binary R packages: e.g. */contrib/4.1
@@ -167,7 +171,11 @@ jobs:
 
   for (dir in pkg_dirs) {
 on_win <- grepl("windows", dir)
-tools::write_PACKAGES(dir, type = ifelse(on_win, "win.binary", 
"mac.binary"), verbose = TRUE )
+tools::write_PACKAGES(dir,
+  type = ifelse(on_win, "win.binary", "mac.binary"),
+  verbose = TRUE,
+  latestOnly = FALSE
+)
   }
   - name: Show repo contents
 run: tree repo



[arrow] branch master updated (1a35aa6c57 -> 2aa7923fb6)

2022-07-07 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 1a35aa6c57 ARROW-16679: [R] configure fails if CDPATH is not null 
(#13313)
 add 2aa7923fb6 MINOR: [R] Fix nightly failures with r_vec_size (#13538)

No new revisions were added by this update.

Summary of changes:
 r/src/arrowExports.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[arrow] branch master updated: ARROW-16679: [R] configure fails if CDPATH is not null (#13313)

2022-07-07 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a35aa6c57 ARROW-16679: [R] configure fails if CDPATH is not null 
(#13313)
1a35aa6c57 is described below

commit 1a35aa6c57379d922f3086da708077e8786aa06e
Author: Jacob Wujciak-Jens 
AuthorDate: Thu Jul 7 22:44:02 2022 +0200

ARROW-16679: [R] configure fails if CDPATH is not null (#13313)

Authored-by: Jacob Wujciak-Jens 
Signed-off-by: Neal Richardson 
---
 r/configure | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/r/configure b/r/configure
index d62c58eeda..68dfd5f5ee 100755
--- a/r/configure
+++ b/r/configure
@@ -177,7 +177,8 @@ else
 # Assume nixlibs.R has handled and messaged about its failure already
 #
 # TODO: what about non-bundled deps?
-BUNDLED_LIBS=`cd $LIB_DIR && ls *.a`
+# Set CDPATH locally to prevent interference from global CDPATH (if 
set) 
+BUNDLED_LIBS=`CDPATH=''; cd $LIB_DIR && ls *.a`
 BUNDLED_LIBS=`echo "$BUNDLED_LIBS" | sed -e "s/\\.a lib/ -l/g" | sed 
-e "s/\\.a$//" | sed -e "s/^lib/-l/" | tr '\n' ' ' | sed -e "s/ $//"`
 PKG_DIRS="-L`pwd`/$LIB_DIR"
 



[arrow] branch master updated: ARROW-16752: [R] Rework Linux binary installation (#13464)

2022-07-06 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new c492ef497a ARROW-16752: [R] Rework Linux binary installation (#13464)
c492ef497a is described below

commit c492ef497a62e600c9436f2a92dace1190c7a465
Author: Neal Richardson 
AuthorDate: Wed Jul 6 10:16:31 2022 -0400

ARROW-16752: [R] Rework Linux binary installation (#13464)

See the jira for the main behavior changes here. Other changes of note:

* There are more brief messages printed to the installation log, even in 
the default "quiet" mode, that indicate which branch of the logic in nixlibs.R 
you've gone through. They're factual and generally connected to the tests that 
are being run, but they are worded somewhat ambiguously or coded, so as not to 
run afoul of censors should they appear in the wrong context. This should help 
us in the triaging of installation failures, even in circumstances where we 
can't enable greater verbosity.
* There is a start of a test suite for nixlibs.R, run separately from the 
package tests. It has been wired up to run in `ci/scripts/r_test.sh`.

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 ci/scripts/r_docker_configure.sh  |  21 +--
 ci/scripts/r_test.sh  |   3 +
 dev/release/rat_exclude_files.txt |   1 +
 dev/tasks/macros.jinja|  18 +--
 dev/tasks/r/github.packages.yml   |  70 ++
 r/tools/nixlibs-allowlist.txt |   4 +
 r/tools/nixlibs.R | 268 +++---
 r/tools/test-nixlibs.R| 112 
 r/vignettes/install.Rmd   | 112 
 9 files changed, 434 insertions(+), 175 deletions(-)

diff --git a/ci/scripts/r_docker_configure.sh b/ci/scripts/r_docker_configure.sh
index 9f93ba2b61..2bc5a4806f 100755
--- a/ci/scripts/r_docker_configure.sh
+++ b/ci/scripts/r_docker_configure.sh
@@ -19,12 +19,14 @@
 set -ex
 
 : ${R_BIN:=R}
+# This is where our docker setup puts things; set this to run outside of docker
+: ${ARROW_SOURCE_HOME:=/arrow}
 
 # The Dockerfile should have put this file here
-if [ -f "/arrow/ci/etc/rprofile" ]; then
+if [ -f "${ARROW_SOURCE_HOME}/ci/etc/rprofile" ]; then
   # Ensure parallel R package installation, set CRAN repo mirror,
   # and use pre-built binaries where possible
-  cat /arrow/ci/etc/rprofile >> $(${R_BIN} RHOME)/etc/Rprofile.site
+  cat ${ARROW_SOURCE_HOME}/ci/etc/rprofile >> $(${R_BIN} 
RHOME)/etc/Rprofile.site
 fi
 
 # Ensure parallel compilation of C/C++ code
@@ -74,6 +76,9 @@ if [ "$RHUB_PLATFORM" = "linux-x86_64-fedora-clang" ]; then
   sed -i.bak -E -e 's/(CXX1?1? =.*)/\1 -stdlib=libc++/g' $(${R_BIN} 
RHOME)/etc/Makeconf
   rm -rf $(${R_BIN} RHOME)/etc/Makeconf.bak
 
+  sed -i.bak -E -e 's/(\-std=gnu\+\+)/-std=c++/g' $(${R_BIN} 
RHOME)/etc/Makeconf
+  rm -rf $(${R_BIN} RHOME)/etc/Makeconf.bak
+
   sed -i.bak -E -e 's/(CXXFLAGS = )(.*)/\1 -g -O3 -Wall -pedantic -frtti 
-fPIC/' $(${R_BIN} RHOME)/etc/Makeconf
   rm -rf $(${R_BIN} RHOME)/etc/Makeconf.bak
 
@@ -88,8 +93,8 @@ if [[ "$DEVTOOLSET_VERSION" -gt 0 ]]; then
   $PACKAGE_MANAGER install -y "devtoolset-$DEVTOOLSET_VERSION"
 fi
 
-if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_R_DEV" == "TRUE" ]; then
-  # Install curl and openssl for S3 support
+if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_GCS" == "ON" ] || [ "$ARROW_R_DEV" == 
"TRUE" ]; then
+  # Install curl and openssl for S3/GCS support
   if [ "$PACKAGE_MANAGER" = "apt-get" ]; then
 apt-get install -y libcurl4-openssl-dev libssl-dev
   else
@@ -97,12 +102,12 @@ if [ "$ARROW_S3" == "ON" ] || [ "$ARROW_R_DEV" == "TRUE" 
]; then
   fi
 
   # The Dockerfile should have put this file here
-  if [ -f "/arrow/ci/scripts/install_minio.sh" ] && [ "`which wget`" ]; then
-/arrow/ci/scripts/install_minio.sh latest /usr/local
+  if [ -f "${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh" ] && [ "`which 
wget`" ]; then
+${ARROW_SOURCE_HOME}/ci/scripts/install_minio.sh latest /usr/local
   fi
 
-  if [ -f "/arrow/ci/scripts/install_gcs_testbench.sh" ] && [ "`which pip`" ]; 
then
-/arrow/ci/scripts/install_gcs_testbench.sh default
+  if [ -f "${ARROW_SOURCE_HOME}/ci/scripts/install_gcs_testbench.sh" ] && [ 
"`which pip`" ]; then
+${ARROW_SOURCE_HOME}/ci/scripts/install_gcs_testbench.sh default
   fi
 fi
 
diff --git a/ci/scripts/r_test.sh b/ci/scripts/r_test.sh
index 8429187d88..0328df2384 100755
--- a/ci/scripts/r_test.sh
+++ b/ci/scripts/r_test.sh
@@ -26,6 +26,

[arrow] branch master updated: ARROW-16871: [R] Implement exp() and sqrt() in Arrow dplyr queries (#13517)

2022-07-05 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d1d03f05a ARROW-16871: [R] Implement exp() and sqrt() in Arrow dplyr 
queries (#13517)
7d1d03f05a is described below

commit 7d1d03f05ada61aa11b2ac432faf349eda8f030e
Author: Christopher D. Higgins <40569964+higg...@users.noreply.github.com>
AuthorDate: Tue Jul 5 16:49:14 2022 -0400

ARROW-16871: [R] Implement exp() and sqrt() in Arrow dplyr queries (#13517)

In response to https://issues.apache.org/jira/browse/ARROW-16871
- implement `sqrt` and `exp` bindings for `dplyr`
- change `sqrt` in `arrow-datum.R` to use `sqrt_checked` rather than 
`power_checked`
-  write tests for `sqrt` and `exp`

Authored-by: Christopher D. Higgins 
<40569964+higg...@users.noreply.github.com>
Signed-off-by: Neal Richardson 
---
 r/R/arrow-datum.R|  2 +-
 r/R/dplyr-funcs-math.R   | 15 +++
 r/tests/testthat/test-dplyr-funcs-math.R | 22 ++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/r/R/arrow-datum.R b/r/R/arrow-datum.R
index 4ec5f8f9d6..39362628bb 100644
--- a/r/R/arrow-datum.R
+++ b/r/R/arrow-datum.R
@@ -123,7 +123,7 @@ Math.ArrowDatum <- function(x, ..., base = exp(1), digits = 
0) {
   x,
   options = list(ndigits = digits, round_mode = RoundMode$HALF_TO_EVEN)
 ),
-sqrt = eval_array_expression("power_checked", x, 0.5),
+sqrt = eval_array_expression("sqrt_checked", x),
 exp = eval_array_expression("power_checked", exp(1), x),
 signif = ,
 expm1 = ,
diff --git a/r/R/dplyr-funcs-math.R b/r/R/dplyr-funcs-math.R
index b92c202d04..0ba2ddc856 100644
--- a/r/R/dplyr-funcs-math.R
+++ b/r/R/dplyr-funcs-math.R
@@ -80,4 +80,19 @@ register_bindings_math <- function() {
   options = list(ndigits = digits, round_mode = RoundMode$HALF_TO_EVEN)
 )
   })
+
+  register_binding("sqrt", function(x) {
+build_expr(
+  "sqrt_checked",
+  x
+)
+  })
+
+  register_binding("exp", function(x) {
+build_expr(
+  "power_checked",
+  exp(1),
+  x
+)
+  })
 }
diff --git a/r/tests/testthat/test-dplyr-funcs-math.R 
b/r/tests/testthat/test-dplyr-funcs-math.R
index dd982c9942..47a9f0b7c0 100644
--- a/r/tests/testthat/test-dplyr-funcs-math.R
+++ b/r/tests/testthat/test-dplyr-funcs-math.R
@@ -330,3 +330,25 @@ test_that("floor division maintains type consistency with 
R", {
 df
   )
 })
+
+test_that("exp()", {
+  df <- tibble(x = c(1:5, NA))
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(y = exp(x)) %>%
+  collect(),
+df
+  )
+})
+
+test_that("sqrt()", {
+  df <- tibble(x = c(1:5, NA))
+
+  compare_dplyr_binding(
+.input %>%
+  mutate(y = sqrt(x)) %>%
+  collect(),
+df
+  )
+})



[arrow] branch master updated: ARROW-16912: [R][CI] Fix nightly centos package without GCS (#13441)

2022-06-29 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 2c67e72f3a ARROW-16912: [R][CI] Fix nightly centos package without GCS 
(#13441)
2c67e72f3a is described below

commit 2c67e72f3aa75029f277653c9c32af29c485721f
Author: Neal Richardson 
AuthorDate: Wed Jun 29 17:46:52 2022 -0400

ARROW-16912: [R][CI] Fix nightly centos package without GCS (#13441)

cc @assignUser

Most of the diff seems to be my editor trimming whitespace. The actual 
changes:

* Rename `r-nightly-packages` to `r-binary-packages` since they can be run 
on demand (not only nightly)
* Add it to the `r` crossbow group
* Turn ARROW_GCS=OFF in the centos-7 package. Where this setting happens is 
not obvious.

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 .github/workflows/r_nightly.yml | 10 +-
 dev/tasks/r/github.packages.yml | 19 +--
 dev/tasks/tasks.yml |  5 +++--
 docker-compose.yml  | 14 --
 4 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/.github/workflows/r_nightly.yml b/.github/workflows/r_nightly.yml
index e4693a155f..9ee0968d85 100644
--- a/.github/workflows/r_nightly.yml
+++ b/.github/workflows/r_nightly.yml
@@ -17,7 +17,7 @@
 
 name: Upload R Nightly builds
 # This workflow downloads the (nightly) binaries created in crossbow and 
uploads them
-# to nightlies.apache.org. Due to authorization requirements, this upload 
can't be done 
+# to nightlies.apache.org. Due to authorization requirements, this upload 
can't be done
 
 # from the crossbow repository.
 
@@ -51,7 +51,7 @@ jobs:
   fetch-depth: 0
   path: crossbow
   repository: ursacomputing/crossbow
-  ref: master 
+  ref: master
   - name: Set up Python
 uses: actions/setup-python@v3
 with:
@@ -70,7 +70,7 @@ jobs:
   fi
   echo $PREFIX
 
-  archery crossbow download-artifacts -f r-nightly-packages -t 
binaries $PREFIX
+  archery crossbow download-artifacts -f r-binary-packages -t binaries 
$PREFIX
 
   if [ -n "$(ls -A binaries/*/*/)" ]; then
 echo "Found files!"
@@ -83,12 +83,12 @@ jobs:
 run: |
   # folder that we rsync to nightlies.apache.org
   repo_root <- "repo"
-  # The binaries are in a nested dir 
+  # The binaries are in a nested dir
   # so we need to find the correct path.
   art_path <- list.files("binaries",
 recursive = TRUE,
 include.dirs = TRUE,
-pattern = "r-nightly-packages$",
+pattern = "r-binary-packages$",
 full.names = TRUE
   )
 
diff --git a/dev/tasks/r/github.packages.yml b/dev/tasks/r/github.packages.yml
index 4f5caa0e1c..76beb6400c 100644
--- a/dev/tasks/r/github.packages.yml
+++ b/dev/tasks/r/github.packages.yml
@@ -18,7 +18,7 @@
 {% import 'macros.jinja' as macros with context %}
 
 # This allows us to set a custom version via param:
-# crossbow submit --param custom_version=8.5.3 r-nightly-packages
+# crossbow submit --param custom_version=8.5.3 r-binary-packages
 # if the param is unset defaults to the usual Ymd naming scheme
 {% set package_version = custom_version|default("\\2.\'\"$(date +%Y%m%d)\"\'") 
%}
 # We need this as boolean and string
@@ -44,7 +44,7 @@ jobs:
   - name: Save Version
 id: save-version
 shell: bash
-run: | 
+run: |
   echo "::set-output name=pkg_version::$(grep ^Version 
arrow/r/DESCRIPTION | sed s/Version:\ //)"
 
   - uses: r-lib/actions/setup-r@v2
@@ -99,7 +99,7 @@ jobs:
   cd arrow/r/libarrow/dist
   # These files were created by the docker user so we have to sudo to 
get them
   sudo -E zip -r $PKG_FILE lib/ include/
-  
+
   - name: Upload binary artifact
 uses: actions/upload-artifact@v3
 with:
@@ -131,7 +131,7 @@ jobs:
 uses: actions/upload-artifact@v3
 with:
   name: r-lib__libarrow__bin__windows
-  path: build/arrow-*.zip 
+  path: build/arrow-*.zip
 
   r-packages:
 needs: [source, windows-cpp]
@@ -158,7 +158,7 @@ jobs:
   - name: Build Binary
 id: build
 shell: Rscript {0}
-env:  
+env:
   ARROW_R_DEV: TRUE
 run: |
   on_windows <- tolower(Sys.info()[["sysname"]]) == "windows"
@@ -171,7 +171,7 @@ jobs:
 
   cat("Remove old arrow version.\n")
   remove.packages("arrow")
-  
+
   # Build
   Sys.setenv(MAKEFLAGS = paste0("-j", parallel::detect

[arrow] branch dont-r-nightly-on-fork created (now bc99176fe5)

2022-06-28 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch dont-r-nightly-on-fork
in repository https://gitbox.apache.org/repos/asf/arrow.git


  at bc99176fe5 Update r_nightly.yml

This branch includes the following new commits:

 new bc99176fe5 Update r_nightly.yml

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow] 01/01: Update r_nightly.yml

2022-06-28 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch dont-r-nightly-on-fork
in repository https://gitbox.apache.org/repos/asf/arrow.git

commit bc99176fe5b26a6ec45ee3a877b8c74bf6036a79
Author: Neal Richardson 
AuthorDate: Tue Jun 28 20:21:11 2022 -0400

Update r_nightly.yml
---
 .github/workflows/r_nightly.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/r_nightly.yml b/.github/workflows/r_nightly.yml
index 0f657a85ad..e4693a155f 100644
--- a/.github/workflows/r_nightly.yml
+++ b/.github/workflows/r_nightly.yml
@@ -34,6 +34,7 @@ on:
 
 jobs:
   upload:
+if: github.repository == 'apache/arrow'
 runs-on: ubuntu-latest
 steps:
   - name: Checkout Arrow



[arrow] branch master updated: ARROW-16510: [R] Add bindings for GCS filesystem (#13404)

2022-06-26 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ac0959ac1 ARROW-16510: [R] Add bindings for GCS filesystem (#13404)
3ac0959ac1 is described below

commit 3ac0959ac168caebb19dfbfbc8881323e694a4ae
Author: Neal Richardson 
AuthorDate: Sun Jun 26 09:43:31 2022 -0400

ARROW-16510: [R] Add bindings for GCS filesystem (#13404)

This adds basic bindings for GcsFileSystem to R, turns it on in the macOS, 
Windows, and Linux packaging (same handling as ARROW_S3), and basic R tests.

Followups:

- Bindings for FromImpersonatedServiceAccount (ARROW-16885)
- Set up testbench for fuller tests, like how we do with minio (ARROW-16879)
- GcsFileSystem::Make should return Result (ARROW-16884)
- Explore auth integration/compatibility with `gargle`, `googleAuthR`, 
etc.: can we pick up the same credentials they use (ARROW-16880)
- macOS binary packaging: push dependencies upstream (ARROW-16883)
- Windows binary packaging: push dependencies upstream (ARROW-16878)
- Update cloud/filesystem documentation (ARROW-16887)

Lead-authored-by: Neal Richardson 
Co-authored-by: Sutou Kouhei 
Signed-off-by: Neal Richardson 
---
 .github/workflows/cpp.yml  |   8 +-
 .github/workflows/r.yml|   2 +-
 ci/scripts/PKGBUILD|   5 +
 ci/scripts/r_windows_build.sh  |   6 +-
 .../google-cloud-cpp-curl-static-windows.patch |  31 +++
 cpp/cmake_modules/ThirdpartyToolchain.cmake| 276 +
 cpp/src/arrow/filesystem/gcsfs.h   |   1 +
 cpp/src/arrow/filesystem/type_fwd.h|   1 +
 cpp/thirdparty/versions.txt|   4 +-
 .../homebrew-formulae/autobrew/apache-arrow.rb |   1 +
 dev/tasks/r/github.macos.brew.yml  |   2 +
 dev/tasks/tasks.yml|   2 +-
 r/R/arrow-info.R   |  11 +-
 r/R/arrowExports.R |   5 +
 r/R/filesystem.R   |  79 +-
 r/configure|  36 +--
 r/configure.win|  13 +-
 r/data-raw/codegen.R   |  63 ++---
 r/inst/build_arrow_static.sh   |   1 +
 r/src/arrowExports.cpp |  27 ++
 r/src/filesystem.cpp   |  81 ++
 r/tests/testthat/test-gcs.R|  60 +
 r/tools/autobrew   |   1 +
 r/tools/nixlibs.R  |  42 +++-
 r/vignettes/developers/setup.Rmd   |   2 +
 r/vignettes/install.Rmd| 101 
 26 files changed, 627 insertions(+), 234 deletions(-)

diff --git a/.github/workflows/cpp.yml b/.github/workflows/cpp.yml
index b914b7df52..acb3270a5d 100644
--- a/.github/workflows/cpp.yml
+++ b/.github/workflows/cpp.yml
@@ -276,8 +276,12 @@ jobs:
   ARROW_DATASET: ON
   ARROW_FLIGHT: ON
   ARROW_GANDIVA: ON
-  # google-could-cpp uses _dupenv_s() but it can't be used with msvcrt.
-  # We need to use ucrt to use _dupenv_s().
+  # With GCS on,
+  # * MinGW 32 build OOMs (maybe turn off unity build?)
+  # * MinGW 64 fails to compile the GCS filesystem tests, some conflict
+  #   with boost. First error says:
+  # 
D:/a/_temp/msys64/mingw64/include/boost/asio/detail/socket_types.hpp:24:4: 
error: #error WinSock.h has already been included
+  # TODO(ARROW-16906)
   # ARROW_GCS: ON
   ARROW_HDFS: OFF
   ARROW_HOME: /mingw${{ matrix.mingw-n-bits }}
diff --git a/.github/workflows/r.yml b/.github/workflows/r.yml
index 48d9672c74..86e006d538 100644
--- a/.github/workflows/r.yml
+++ b/.github/workflows/r.yml
@@ -165,7 +165,7 @@ jobs:
 name: AMD64 Windows C++ RTools ${{ matrix.config.rtools }} ${{ 
matrix.config.arch }}
 runs-on: windows-2019
 if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
-timeout-minutes: 60
+timeout-minutes: 90
 strategy:
   fail-fast: false
   matrix:
diff --git a/ci/scripts/PKGBUILD b/ci/scripts/PKGBUILD
index b9b0194f5c..ea17fba17e 100644
--- a/ci/scripts/PKGBUILD
+++ b/ci/scripts/PKGBUILD
@@ -25,6 +25,7 @@ arch=("any")
 url="https://arrow.apache.org/;
 license=("Apache-2.0")
 depends=("${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp"
+ "${MINGW_PACKAGE_PREFIX}-curl" # for google-cloud-cpp bundled build
  "${MINGW_PACKAGE_PREFIX}-libutf8proc"
  "${MINGW_PACKAGE_PREFIX}-re2"
  "${MINGW_PACKAGE_PREFIX}-thrift"
@@ -79,11 +80,13 @@ build() {
 export 

[arrow] branch master updated: ARROW-16900: [R] Upgrade lintr (#13432)

2022-06-24 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 241c8e6242 ARROW-16900: [R] Upgrade lintr (#13432)
241c8e6242 is described below

commit 241c8e6242044530e4a9ea13661ca78a100f
Author: Neal Richardson 
AuthorDate: Fri Jun 24 12:13:54 2022 -0400

ARROW-16900: [R] Upgrade lintr (#13432)

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 ci/docker/linux-apt-lint.dockerfile | 16 ++--
 r/.lintr|  6 +++---
 r/lint.sh   |  2 +-
 r/vignettes/developers/workflow.Rmd |  4 +---
 4 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/ci/docker/linux-apt-lint.dockerfile 
b/ci/docker/linux-apt-lint.dockerfile
index 249072ae32..8a679be2eb 100644
--- a/ci/docker/linux-apt-lint.dockerfile
+++ b/ci/docker/linux-apt-lint.dockerfile
@@ -56,20 +56,8 @@ COPY ci/etc/rprofile /arrow/ci/etc/
 RUN cat /arrow/ci/etc/rprofile >> $(R RHOME)/etc/Rprofile.site
 # Also ensure parallel compilation of C/C++ code
 RUN echo "MAKEFLAGS=-j$(R -s -e 'cat(parallel::detectCores())')" >> $(R 
RHOME)/etc/Renviron.site
-
-
-COPY ci/scripts/r_deps.sh /arrow/ci/scripts/
-COPY r/DESCRIPTION /arrow/r/
-# We need to install Arrow's dependencies in order for lintr's namespace 
searching to work.
-# This could be removed if lintr no longer loads the dependency namespaces 
(see issues/PRs below)
-RUN /arrow/ci/scripts/r_deps.sh /arrow
-# This fork has a number of changes that have PRs and Issues to resolve 
upstream:
-#   https://github.com/jimhester/lintr/pull/843
-#   https://github.com/jimhester/lintr/pull/841
-#   https://github.com/jimhester/lintr/pull/845
-#   https://github.com/jimhester/lintr/issues/842
-#   https://github.com/jimhester/lintr/issues/846
-RUN R -e "remotes::install_github('jonkeane/lintr@arrow-branch')"
+# We don't need arrow's dependencies, only lintr (and its dependencies)
+RUN R -e "install.packages('lintr')"
 
 # Docker linter
 COPY --from=hadolint /bin/hadolint /usr/bin/hadolint
diff --git a/r/.lintr b/r/.lintr
index 0298fd7f99..619339afca 100644
--- a/r/.lintr
+++ b/r/.lintr
@@ -14,7 +14,7 @@ license:  #  Licensed to the Apache Software Foundation (ASF) 
under one
   #  KIND, either express or implied.  See the License for the
   #  specific language governing permissions and limitations
   #  under the License.
-linters: with_defaults(
+linters: linters_with_defaults(
   line_length_linter = line_length_linter(120),
   object_name_linter = NULL,
   # Even with a liberal definition of name styles, some of our names cause 
issues due to `.`s for s3 classes or NA in the name
@@ -22,8 +22,8 @@ linters: with_defaults(
   # object_name_linter = object_name_linter(styles = c("snake_case", 
"camelCase", "CamelCase", "symbols", "dotted.case", "UPPERCASE", "SNAKE_CASE")),
   object_length_linter = object_length_linter(40),
   object_usage_linter = NULL, # R6 methods are flagged,
-  cyclocomp_linter = cyclocomp_linter(26), # TODO: reduce to default of 15
-  open_curly_linter = NULL # styler and lintr conflict on this 
(https://github.com/r-lib/styler/issues/549#issuecomment-537191536)
+  cyclocomp_linter = cyclocomp_linter(26) # TODO: reduce to default of 15
+  # See also https://github.com/r-lib/lintr/issues/804 for cyclocomp issues 
with R6
   )
 exclusions: list(
   "R/arrowExports.R",
diff --git a/r/lint.sh b/r/lint.sh
index 91435e7e01..21e7374733 100755
--- a/r/lint.sh
+++ b/r/lint.sh
@@ -51,4 +51,4 @@ $CPP_BUILD_SUPPORT/run_cpplint.py \
 
 # Run lintr
 R -e "if(!requireNamespace('lintr', quietly=TRUE)){stop('lintr is not 
installed, please install it with R -e \"install.packages(\'lintr\')\"')}"
-NOT_CRAN=true R -e "lintr::lint_package('${SOURCE_DIR}', path_prefix = 'r')"
+NOT_CRAN=true R -e "lintr::lint_package('${SOURCE_DIR}')"
diff --git a/r/vignettes/developers/workflow.Rmd 
b/r/vignettes/developers/workflow.Rmd
index b7e0a27d76..cb88a6af6c 100644
--- a/r/vignettes/developers/workflow.Rmd
+++ b/r/vignettes/developers/workflow.Rmd
@@ -7,7 +7,6 @@ knitr::opts_chunk$set(error = TRUE, eval = FALSE)
 The Arrow R package uses several additional development tools:
 
 * [`lintr`](https://github.com/r-lib/lintr) for code analysis
-  - for the time being, the R package uses a custom version of lintr - 
`jonkeane/lintr@arrow-branch`
 * [`styler`](https://styler.r-lib.org) for code styling
 * [`pkgdown`](https://pkgdown.r-lib.org) for building the website
 * [`roxygen2`](https://roxygen2.r-lib.org) for documenting the package
@@ -16,8 +15,7 @@ The Arrow R package uses several additional development tools:
 You can install all these additional dependencies by running:
 
 

[arrow] branch master updated: ARROW-16899: [R][CI] R nightly builds used old libarrow (#13411)

2022-06-24 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 9e5d3e6f87 ARROW-16899: [R][CI] R nightly builds used old libarrow 
(#13411)
9e5d3e6f87 is described below

commit 9e5d3e6f87a6a3a4ae8384f68f60b7b739a72e45
Author: Jacob Wujciak-Jens 
AuthorDate: Fri Jun 24 18:13:02 2022 +0200

ARROW-16899: [R][CI] R nightly builds used old libarrow (#13411)

Authored-by: Jacob Wujciak-Jens 
Signed-off-by: Neal Richardson 
---
 dev/tasks/macros.jinja  | 19 ---
 dev/tasks/r/github.packages.yml | 31 +--
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/dev/tasks/macros.jinja b/dev/tasks/macros.jinja
index 4e7fc4cf35..03de66cbe1 100644
--- a/dev/tasks/macros.jinja
+++ b/dev/tasks/macros.jinja
@@ -293,6 +293,19 @@ on:
 shell: Rscript {0}
 run: |
   # getwd() is necessary as this macro is used within jobs using a docker 
container
-  tools::write_PACKAGES(file.path(getwd(), "/repo/src/contrib", fsep = 
"/"), type = "source", verbose = TRUE)
-  - run: ls -R repo
-{% endmacro %}
+  tools::write_PACKAGES(file.path(getwd(), "repo/src/contrib", fsep = 
"/"), type = "source", verbose = TRUE)
+  - name: Show repo 
+shell: bash 
+# tree not available in git-bash on windows
+run: |
+  ls -R repo
+  - name: Add dev repo to .Rprofile
+shell: Rscript {0}
+run: |
+  str <- paste0("options(arrow.dev_repo ='file://", getwd(), "/repo' )")
+  print(str)
+  profile_path <- file.path(getwd(), ".Rprofile")
+  write(str, file = profile_path, append = TRUE)
+  # Set envvar for later steps by appending to $GITHUB_ENV 
+  write(paste0("R_PROFILE_USER=", profile_path), file = 
Sys.getenv("GITHUB_ENV"), append = TRUE)
+  {% endmacro %}
diff --git a/dev/tasks/r/github.packages.yml b/dev/tasks/r/github.packages.yml
index 30afda4ffa..4f5caa0e1c 100644
--- a/dev/tasks/r/github.packages.yml
+++ b/dev/tasks/r/github.packages.yml
@@ -158,6 +158,8 @@ jobs:
   - name: Build Binary
 id: build
 shell: Rscript {0}
+env:  
+  ARROW_R_DEV: TRUE
 run: |
   on_windows <- tolower(Sys.info()[["sysname"]]) == "windows"
 
@@ -166,17 +168,9 @@ jobs:
 type = "binary",
 repos = c("https://nightlies.apache.org/arrow/r;, 
"https://cloud.r-project.org;)
   )
-  remove.packages("arrow")
 
-  # Setup local repo
-  dev_repo <- paste0(
-ifelse(on_windows, "file:", "file://"),
-getwd(),
-"/repo")
-  
-  # This is necessary to use the local folder as a repo in both
-  # install_arrow & tools/*libs.R
-  options(arrow.dev_repo = dev_repo)
+  cat("Remove old arrow version.\n")
+  remove.packages("arrow")
   
   # Build
   Sys.setenv(MAKEFLAGS = paste0("-j", parallel::detectCores()))
@@ -186,11 +180,12 @@ jobs:
 INSTALL_opts <- c(INSTALL_opts, "--strip")
   }
 
- 
+  cat("Install arrow from dev repo.\n")
   install.packages(
 "arrow",
 type = "source",
-repos = dev_repo,
+# The sub is necessary to prevent an error on windows.
+repos = sub("file://", "file:", getOption("arrow.dev_repo")),,
 INSTALL_opts = INSTALL_opts
   )
 
@@ -248,15 +243,11 @@ jobs:
 
   # Add R-devel to PATH
   echo "/opt/R-devel/bin" >> $GITHUB_PATH
+
   {{ macros.github_setup_local_r_repo(true, false)|indent }}
-  - name: Set dev repo
-shell: bash
-run: |
-  # It is important to use pwd here as this happens inside a container 
so the 
-  # normal github.workspace path is wrong.
-  echo "options(arrow.dev_repo = 'file://$(pwd)/repo')" >> ~/.Rprofile
   - name: Install arrow from our repo
 env:
+  ARROW_R_DEV: TRUE
   LIBARROW_BUILD: "FALSE"
   LIBARROW_BINARY: "TRUE"
 shell: Rscript {0}
@@ -273,10 +264,6 @@ jobs:
 with:
   install-r: false
   {{ macros.github_setup_local_r_repo(false, false)|indent }}
-  - name: Set dev repo
-shell: bash
-run: |
-  echo "options(arrow.dev_repo = 'file://$(pwd)/repo')" >> ~/.Rprofile
   - name: Install arrow from nightly repo
 env:
   # Test source build so be sure not to download a binary



[arrow] branch master updated: ARROW-16689: [CI] Improve R Nightly Workflow (#13266)

2022-06-07 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 46116c48c8 ARROW-16689: [CI] Improve R Nightly Workflow (#13266)
46116c48c8 is described below

commit 46116c48c8037117ba71f91b7d0f17d22de0b530
Author: Jacob Wujciak-Jens 
AuthorDate: Tue Jun 7 16:28:53 2022 +0200

ARROW-16689: [CI] Improve R Nightly Workflow (#13266)

Lead-authored-by: Jacob Wujciak-Jens 
Co-authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 .github/workflows/r_nightly.yml| 83 +-
 LICENSE.txt|  8 +++
 dev/tasks/macros.jinja | 28 
 .../r/{github.nightly.yml => github.packages.yml}  | 64 ++---
 dev/tasks/tasks.yml| 13 +++-
 5 files changed, 119 insertions(+), 77 deletions(-)

diff --git a/.github/workflows/r_nightly.yml b/.github/workflows/r_nightly.yml
index 8fb96a2796..0f657a85ad 100644
--- a/.github/workflows/r_nightly.yml
+++ b/.github/workflows/r_nightly.yml
@@ -16,6 +16,10 @@
 # under the License.
 
 name: Upload R Nightly builds
+# This workflow downloads the (nightly) binaries created in crossbow and 
uploads them
+# to nightlies.apache.org. Due to authorization requirements, this upload 
can't be done 
+
+# from the crossbow repository.
 
 on:
   workflow_dispatch:
@@ -25,13 +29,11 @@ on:
 required: false
 default: ''
   schedule:
-#Crossbow packagin runs at 0 8 * * *
+#Crossbow packaging runs at 0 8 * * *
 - cron: '0 14 * * *'
 
 jobs:
   upload:
-env:
-  PREFIX: ${{ github.event.inputs.prefix || ''}}
 runs-on: ubuntu-latest
 steps:
   - name: Checkout Arrow
@@ -59,59 +61,70 @@ jobs:
 run: pip install -e arrow/dev/archery[all]
   - run: mkdir -p binaries
   - name: Download Artifacts
+env:
+  PREFIX: ${{ github.event.inputs.prefix || ''}}
 run: |
   if [ -z $PREFIX ]; then
 PREFIX=nightly-packaging-$(date +%Y-%m-%d)-0
   fi
   echo $PREFIX
 
-  archery crossbow download-artifacts -f r-nightly-packages -t 
binaries --skip-pattern-validation $PREFIX
+  archery crossbow download-artifacts -f r-nightly-packages -t 
binaries $PREFIX
+
+  if [ -n "$(ls -A binaries/*/*/)" ]; then
+echo "Found files!"
+  else
+echo "No files found. Stopping upload."
+exit 1
+  fi
   - name: Build Repository
 shell: Rscript {0}
 run: |
+  # folder that we rsync to nightlies.apache.org
+  repo_root <- "repo"
+  # The binaries are in a nested dir 
+  # so we need to find the correct path.
   art_path <- list.files("binaries",
-  recursive = TRUE,
-  include.dirs = TRUE,
-  pattern = "r-nightly-packages$",
-  full.names = TRUE
+recursive = TRUE,
+include.dirs = TRUE,
+pattern = "r-nightly-packages$",
+full.names = TRUE
   )
 
-  pkgs <- list.files(art_path, pattern = "r-pkg_*")
-  src_i <- grep("r-pkg_src", pkgs)
-  src_pkg <- pkgs[src_i]
-  pkgs <- pkgs[-src_i]
-  libs <- list.files(art_path, pattern = "r-libarrow*")
+  current_path <- list.files(art_path, full.names = TRUE, recursive = 
TRUE)
+  files <- sub("r-(pkg|lib)", repo_root, current_path)
 
-  new_names <- sub("r-pkg_", "", pkgs, fixed = T)
-  matches <- regmatches(new_names, 
regexec("(([a-z]+)-[\\.a-zA-Z0-9]+)_(\\d\\.\\d)-(arrow.+)$", new_names))
+  # decode contrib.url from artifact name:
+  # bin__windows__contrib__4.1 -> bin/windows/contrib/4.1
+  new_paths <- gsub("__", "/", files)
+  # strip superfluous nested dirs
+  new_paths <- sub(art_path, ".", new_paths)
+  dirs <- dirname(new_paths)
+  dir_result <- sapply(dirs, dir.create, recursive = TRUE)
 
-  dir.create("repo/src/contrib", recursive = TRUE)
-  file.copy(paste0(art_path, "/", src_pkg), 
paste0("repo/src/contrib/", sub("r-pkg_src-", "", src_pkg)))
-  tools::write_PACKAGES("repo/src/contrib", type = "source", verbose = 
TRUE)
+  if (!all(dir_result)) {
+stop("There was an issue while creating the folders!")
+  }
 
-  for (match in matches) {
-  path <- paste0("repo/bin/", match[[3]],

[arrow] branch master updated: MINOR: [R] Fix duckdb test for dbplyr 2.2.0 internals change (#13323)

2022-06-06 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c63788ff7 MINOR: [R] Fix duckdb test for dbplyr 2.2.0 internals 
change (#13323)
8c63788ff7 is described below

commit 8c63788ff7d52812599a546989b7df10887cb01e
Author: Neal Richardson 
AuthorDate: Mon Jun 6 16:40:56 2022 -0400

MINOR: [R] Fix duckdb test for dbplyr 2.2.0 internals change (#13323)

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/tests/testthat/test-duckdb.R | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/r/tests/testthat/test-duckdb.R b/r/tests/testthat/test-duckdb.R
index 82451017a4..088d7a4bbd 100644
--- a/r/tests/testthat/test-duckdb.R
+++ b/r/tests/testthat/test-duckdb.R
@@ -279,7 +279,8 @@ test_that("to_duckdb passing a connection", {
   table_four <- ds %>%
 select(int, lgl, dbl) %>%
 to_duckdb(con = con_separate, auto_disconnect = FALSE)
-  table_four_name <- table_four$ops$x
+  # dbplyr 2.2.0 renames this internal attribute to lazy_query
+  table_four_name <- table_four$ops$x %||% table_four$lazy_query$x
 
   result <- DBI::dbGetQuery(
 con_separate,



[arrow] branch master updated: MINOR: [R] Drop opensuse42 build and update opensuse15 (#13312)

2022-06-05 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new b821c16e97 MINOR: [R] Drop opensuse42 build and update opensuse15 
(#13312)
b821c16e97 is described below

commit b821c16e976728e617599c8127c85c273cd069a1
Author: Neal Richardson 
AuthorDate: Sun Jun 5 16:52:51 2022 -0400

MINOR: [R] Drop opensuse42 build and update opensuse15 (#13312)

The opensuse42 job has been failing for a while on nightlies, and it is EOL 
and RSPM is no longer doing anything for it, so we should drop it.

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 ci/etc/rprofile | 8 +---
 dev/tasks/tasks.yml | 3 +--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/ci/etc/rprofile b/ci/etc/rprofile
index e9e98b12e4..2f64b17e5d 100644
--- a/ci/etc/rprofile
+++ b/ci/etc/rprofile
@@ -2,7 +2,9 @@ local({
   .pick_cran <- function() {
 # Return a CRAN repo URL, preferring RSPM binaries if available for this OS
 rspm_template <- 
"https://packagemanager.rstudio.com/cran/__linux__/%s/latest;
-supported_os <- c("focal", "xenial", "bionic", "centos7", "centos8", 
"opensuse42", "opensuse15", "opensuse152")
+# See https://github.com/rstudio/r-docker#releases-and-tags,
+# but note that RSPM still uses "centos8"
+supported_os <- c("bionic", "focal", "jammy", "centos7", "centos8", 
"opensuse153")
 
 if (nzchar(Sys.which("lsb_release"))) {
   os <- tolower(system("lsb_release -cs", intern = TRUE))
@@ -19,8 +21,8 @@ local({
 return(sprintf(rspm_template, os))
   } else {
 names(vals) <- sub("^(.*)=.*$", "\\1", os_release)
-if (vals["ID"] == "opensuse") {
-  version <- sub('^"?([0-9]+).*"?.*$', "\\1", vals["VERSION_ID"])
+if (grepl("opensuse", vals["ID"])) {
+  version <- sub('^"?([0-9]+)\\.?([0-9]+).*"?.*$', "\\1\\2", 
vals["VERSION_ID"])
   os <- paste0("opensuse", version)
   if (os %in% supported_os) {
 return(sprintf(rspm_template, os))
diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml
index 11675e8bba..7a8fd83161 100644
--- a/dev/tasks/tasks.yml
+++ b/dev/tasks/tasks.yml
@@ -1331,8 +1331,7 @@ tasks:
 {% for r_org, r_image, r_tag in [("rhub", "ubuntu-gcc-release", "latest"),
  ("rocker", "r-base", "latest"),
  ("rstudio", "r-base", "4.2-focal"),
- ("rstudio", "r-base", "4.1-opensuse15"),
- ("rstudio", "r-base", "4.2-opensuse42")] %}
+ ("rstudio", "r-base", "4.1-opensuse153")] %}
   test-r-{{ r_org }}-{{ r_image }}-{{ r_tag }}:
 ci: azure
 template: r/azure.linux.yml



[arrow] branch master updated: ARROW-16607: [R] Improve KeyValueMetadata handling

2022-05-26 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new a6025f1571 ARROW-16607: [R] Improve KeyValueMetadata handling
a6025f1571 is described below

commit a6025f15712aa0829aab748a8d3e776f335265cc
Author: Neal Richardson 
AuthorDate: Thu May 26 13:14:31 2022 -0400

ARROW-16607: [R] Improve KeyValueMetadata handling

* Pushes KVM handling into ExecPlan so that Run() preserves the R metadata 
we want.
* Also pushes special handling for a kind of collapsed query from collect() 
into Build().
* Better encapsulate KVM for the the $metadata and $r_metadata so that as a 
user/developer, you never have to touch the serialize/deserialize functions, 
you just have a list to work with. This is a slight API change, most noticeable 
if you were to `print(tab$metadata)`; better is to `print(str(tab$metdata))`.
* Factor out a common utility in r/src for taking cpp11::strings (named 
character vector) and producing arrow::KeyValueMetadata

The upshot of all of this is that we can push the ExecPlan evaluation into 
`as_record_batch_reader()`, and all that `collect()` does on top is read the 
RBR into a Table/data.frame. This means that we can plug dplyr queries into 
anything else that expects a RecordBatchReader, and it will be (to the maximum 
extent possible, given the limitations of ExecPlan) streaming, not requiring 
you to `compute()` and materialize things first.

Closes #13210 from nealrichardson/kvm

Authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/R/arrow-tabular.R  |  17 +
 r/R/arrowExports.R   |   4 +-
 r/R/dataset-scan.R   |   4 +-
 r/R/dataset-write.R  |  16 +---
 r/R/dplyr-collect.R  |  31 +---
 r/R/dplyr-group-by.R |   4 +-
 r/R/metadata.R   |  15 ++--
 r/R/query-engine.R   | 159 ++-
 r/R/record-batch-reader.R|   6 +-
 r/R/record-batch.R   |   2 +-
 r/R/schema.R |  26 ---
 r/R/table.R  |   2 +-
 r/src/arrowExports.cpp   |   9 ++-
 r/src/compute-exec.cpp   |  22 +++---
 r/src/schema.cpp |  13 ++--
 r/tests/testthat/test-metadata.R |   7 +-
 16 files changed, 177 insertions(+), 160 deletions(-)

diff --git a/r/R/arrow-tabular.R b/r/R/arrow-tabular.R
index 43110ccf24..58a604ba61 100644
--- a/r/R/arrow-tabular.R
+++ b/r/R/arrow-tabular.R
@@ -70,7 +70,6 @@ ArrowTabular <- R6Class("ArrowTabular",
 self$schema$metadata
   } else {
 # Set the metadata
-new <- prepare_key_value_metadata(new)
 out <- self$ReplaceSchemaMetadata(new)
 # ReplaceSchemaMetadata returns a new object but we're modifying in 
place,
 # so swap in that new C++ object pointer into our R6 object
@@ -82,16 +81,10 @@ ArrowTabular <- R6Class("ArrowTabular",
   # Helper for the R metadata that handles the serialization
   # See also method on Schema
   if (missing(new)) {
-out <- self$metadata$r
-if (!is.null(out)) {
-  # Can't unserialize NULL
-  out <- .unserialize_arrow_r_metadata(out)
-}
-# Returns either NULL or a named list
-out
+self$metadata$r
   } else {
 # Set the R metadata
-self$metadata$r <- .serialize_arrow_r_metadata(new)
+self$metadata$r <- new
 self
   }
 }
@@ -101,11 +94,7 @@ ArrowTabular <- R6Class("ArrowTabular",
 #' @export
 as.data.frame.ArrowTabular <- function(x, row.names = NULL, optional = FALSE, 
...) {
   df <- x$to_data_frame()
-
-  if (!is.null(r_metadata <- x$metadata$r)) {
-df <- apply_arrow_r_metadata(df, .unserialize_arrow_r_metadata(r_metadata))
-  }
-  df
+  apply_arrow_r_metadata(df, x$metadata$r)
 }
 
 #' @export
diff --git a/r/R/arrowExports.R b/r/R/arrowExports.R
index 3414c9b21c..8ad56f227f 100644
--- a/r/R/arrowExports.R
+++ b/r/R/arrowExports.R
@@ -404,8 +404,8 @@ ExecPlan_create <- function(use_threads) {
   .Call(`_arrow_ExecPlan_create`, use_threads)
 }
 
-ExecPlan_run <- function(plan, final_node, sort_options, head) {
-  .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options, head)
+ExecPlan_run <- function(plan, final_node, sort_options, metadata, head) {
+  .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options, metadata, head)
 }
 
 ExecPlan_StopProducing <- function(plan) {
diff --git a/r/R/dataset-scan.R b/r/R/dataset-scan.R
index 72f9dec276..a8da1fb60d 100644
--- a/r/R/dataset-scan.R
+++ b/r/R/dataset-scan.R
@@ -206,10 +206,8 @@ map_batches <- function(X, FUN, ..., .data.frame = NULL) {
   call. = FALSE
 )
   }
-  plan <- Ex

[arrow] branch master updated (6576aa06fd -> d889adec54)

2022-05-24 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 6576aa06fd ARROW-16634: [Gandiva][C++] Add udfdegrees alias
 add d889adec54 ARROW-15622: [R] Implement union_all and union for 
arrow_dplyr_query

No new revisions were added by this update.

Summary of changes:
 r/DESCRIPTION  |  1 +
 r/R/arrow-package.R|  2 +-
 r/R/arrowExports.R |  5 +-
 .../testthat/test-array-data.R => R/dplyr-union.R} | 28 
 r/R/query-engine.R |  7 ++
 r/src/arrowExports.cpp | 10 +++
 r/src/compute-exec.cpp |  7 ++
 r/tests/testthat/test-dplyr-union.R| 74 ++
 8 files changed, 120 insertions(+), 14 deletions(-)
 copy r/{tests/testthat/test-array-data.R => R/dplyr-union.R} (59%)
 create mode 100644 r/tests/testthat/test-dplyr-union.R



[arrow] branch master updated: ARROW-16594: [R] Consistently use "getOption" to set nightly repo

2022-05-20 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new b7507c34b7 ARROW-16594: [R] Consistently use "getOption" to set 
nightly repo
b7507c34b7 is described below

commit b7507c34b71a200c9f08597b83e935d1639dd85c
Author: Jacob Wujciak-Jens 
AuthorDate: Fri May 20 07:52:44 2022 -0700

ARROW-16594: [R] Consistently use "getOption" to set nightly repo

The behavior can be seen in action 
[here](https://github.com/assignUser/test-repo-a/actions/runs/2340358110) where 
I build this branch with the daily version number `20220517` which does not yet 
exists in the s3 bucket.
~~It actually looks like it is not working for linux binary builds 
https://github.com/assignUser/test-repo-a/runs/6472478941?check_suite_focus=true#step:4:153~~
 This issue was due to .Rprofile configuration.

Closes #13173 from assignUser/ARROW-16594-option-devrepo

Authored-by: Jacob Wujciak-Jens 
Signed-off-by: Neal Richardson 
---
 r/tools/nixlibs.R | 2 +-
 r/tools/winlibs.R | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/r/tools/nixlibs.R b/r/tools/nixlibs.R
index fc523f49ed..5b8cc3b72d 100644
--- a/r/tools/nixlibs.R
+++ b/r/tools/nixlibs.R
@@ -19,7 +19,7 @@ args <- commandArgs(TRUE)
 VERSION <- args[1]
 dst_dir <- paste0("libarrow/arrow-", VERSION)
 
-arrow_repo <- "https://arrow-r-nightly.s3.amazonaws.com/libarrow/;
+arrow_repo <- paste0(getOption("arrow.dev_repo", 
"https://arrow-r-nightly.s3.amazonaws.com;), "/libarrow/")
 
 options(.arrow.cleanup = character()) # To collect dirs to rm on exit
 on.exit(unlink(getOption(".arrow.cleanup")))
diff --git a/r/tools/winlibs.R b/r/tools/winlibs.R
index 9435ac3c20..4adedbddb2 100644
--- a/r/tools/winlibs.R
+++ b/r/tools/winlibs.R
@@ -38,7 +38,10 @@ if 
(!file.exists(sprintf("windows/arrow-%s/include/arrow/api.h", VERSION))) {
   )
 }
 # URL templates
-nightly <- 
"https://arrow-r-nightly.s3.amazonaws.com/libarrow/bin/windows/arrow-%s.zip;
+nightly <- paste0(
+  getOption("arrow.dev_repo", "https://arrow-r-nightly.s3.amazonaws.com;),
+  "/libarrow/bin/windows/arrow-%s.zip"
+)
 rwinlib <- "https://github.com/rwinlib/arrow/archive/v%s.zip;
 # First look for a nightly
 get_file(nightly, VERSION)



[arrow] branch master updated (663dc325de -> dc39f83e2f)

2022-05-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


from 663dc325de MINOR: [R] Clarify read_json_arrow() docs
 add dc39f83e2f ARROW-15271: [R] Refactor do_exec_plan to return a 
RecordBatchReader

No new revisions were added by this update.

Summary of changes:
 r/NAMESPACE |  2 ++
 r/R/dataset-scan.R  | 52 +
 r/R/dplyr-collect.R | 17 +-
 r/R/duckdb.R| 14 ++--
 r/R/query-engine.R  | 44 +++-
 r/R/record-batch-reader.R   | 19 ---
 r/R/record-batch.R  |  6 
 r/R/table.R |  7 
 r/man/as_record_batch.Rd|  3 ++
 r/man/map_batches.Rd| 26 ++-
 r/man/to_arrow.Rd   |  8 ++---
 r/src/arrowExports.cpp  |  2 +-
 r/src/recordbatchreader.cpp |  5 +--
 r/tests/testthat/test-dataset-write.R   | 10 +++---
 r/tests/testthat/test-dataset.R | 31 +
 r/tests/testthat/test-duckdb.R  |  2 +-
 r/tests/testthat/test-record-batch-reader.R | 10 --
 17 files changed, 155 insertions(+), 103 deletions(-)



[arrow] branch master updated: MINOR: [R] Clarify read_json_arrow() docs

2022-05-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 663dc325de MINOR: [R] Clarify read_json_arrow() docs
663dc325de is described below

commit 663dc325de1176a5caf32809942acae98abf7a8b
Author: Edward Visel <1693477+alistair...@users.noreply.github.com>
AuthorDate: Wed May 18 15:30:54 2022 -0700

MINOR: [R] Clarify read_json_arrow() docs

A quick PR to clarify `read_json_arrow()` docs I found confusing while 
benchmarking. Specifically, specifies the function

- is for ndjson (as opposed to say the many json formats to which pandas 
can write a dataframe)
- handles compression
- handles implicit and explicit nulls (was in the example, but not 
previously stated)

Open to changes, but do feel these docs need to at least explicitly say 
"ndjson" somewhere.

Closes #13133 from alistaire47/chore/read-json-docs

Lead-authored-by: Edward Visel 
<1693477+alistair...@users.noreply.github.com>
Co-authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/R/json.R   | 6 +-
 r/man/read_json_arrow.Rd | 7 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/r/R/json.R b/r/R/json.R
index 08798bb2e5..19cf6a9299 100644
--- a/r/R/json.R
+++ b/r/R/json.R
@@ -17,7 +17,11 @@
 
 #' Read a JSON file
 #'
-#' Using [JsonTableReader]
+#' Wrapper around [JsonTableReader] to read a newline-delimited JSON (ndjson) 
file into a
+#' data frame or Arrow Table.
+#'
+#' If passed a path, will detect and handle compression from the file extension
+#' (e.g. `.json.gz`). Accepts explicit or implicit nulls.
 #'
 #' @inheritParams read_delim_arrow
 #' @param schema [Schema] that describes the table.
diff --git a/r/man/read_json_arrow.Rd b/r/man/read_json_arrow.Rd
index 610867ca40..2ad600725f 100644
--- a/r/man/read_json_arrow.Rd
+++ b/r/man/read_json_arrow.Rd
@@ -36,7 +36,12 @@ an Arrow \link{Table}?}
 A \code{data.frame}, or a Table if \code{as_data_frame = FALSE}.
 }
 \description{
-Using \link{JsonTableReader}
+Wrapper around \link{JsonTableReader} to read a newline-delimited JSON 
(ndjson) file into a
+data frame or Arrow Table.
+}
+\details{
+If passed a path, will detect and handle compression from the file extension
+(e.g. \code{.json.gz}). Accepts explicit or implicit nulls.
 }
 \examples{
 \dontshow{if (arrow_with_json()) (if (getRversion() >= "3.4") withAutoprint 
else force)(\{ # examplesIf}



[arrow] branch master updated: ARROW-16144: [R] Write compressed data streams (particularly over S3)

2022-05-18 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new d2cbe9e0e2 ARROW-16144: [R] Write compressed data streams 
(particularly over S3)
d2cbe9e0e2 is described below

commit d2cbe9e0e2ce206fba71d3d171babe36bada1a9d
Author: Sam Albers 
AuthorDate: Wed May 18 14:24:55 2022 -0700

ARROW-16144: [R] Write compressed data streams (particularly over S3)

This PR enables reading/writing compressed data streams over s3 and locally 
and adds some tests to test some of those round trips. For the filesystem path 
I had to do a little regex on the string for compression detection but any 
feedback on alternative approaches is very welcome. Previously supplying a file 
with a compression extension wrote out an uncompressed file. Here is a reprex 
of the updated writing behaviour:

```r
library(arrow, warn.conflicts = FALSE)
## local
write_csv_arrow(mtcars, file = file)
write_csv_arrow(mtcars, file = comp_file)
file.size(file)
[1] 1303
file.size(comp_file)
[1] 567

## or with s3
dir <- tempfile()
dir.create(dir)
subdir <- file.path(dir, "bucket")
dir.create(subdir)

minio_server <- processx::process$new("minio", args = c("server", dir), 
supervise = TRUE)
Sys.sleep(2)
stopifnot(minio_server$is_alive())

s3_uri <- 
"s3://minioadmin:minioadmin@?scheme=http_override=localhost%3A9000"
bucket <- s3_bucket(s3_uri)

write_csv_arrow(mtcars, bucket$path("bucket/data.csv.gz"))
write_csv_arrow(mtcars, bucket$path("bucket/data.csv"))

file.size(file.path(subdir, "data.csv.gz"))
[1] 567
file.size(file.path(subdir, "data.csv"))
[1] 1303
```

Closes #13183 from boshek/ARROW-16144

Lead-authored-by: Sam Albers 
Co-authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 r/R/io.R | 23 ++-
 r/R/util.R   |  4 
 r/tests/testthat/test-csv.R  | 17 +
 r/tests/testthat/test-s3-minio.R | 20 
 4 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/r/R/io.R b/r/R/io.R
index 379dcf6f35..8e72187b43 100644
--- a/r/R/io.R
+++ b/r/R/io.R
@@ -270,7 +270,7 @@ make_readable_file <- function(file, mmap = TRUE, 
compression = NULL, filesystem
   file <- ReadableFile$create(file)
 }
 
-if (!identical(compression, "uncompressed")) {
+if (is_compressed(compression)) {
   file <- CompressedInputStream$create(file, compression)
 }
   } else if (inherits(file, c("raw", "Buffer"))) {
@@ -292,7 +292,7 @@ make_readable_file <- function(file, mmap = TRUE, 
compression = NULL, filesystem
   file
 }
 
-make_output_stream <- function(x, filesystem = NULL) {
+make_output_stream <- function(x, filesystem = NULL, compression = NULL) {
   if (inherits(x, "connection")) {
 if (!isOpen(x)) {
   open(x, "wb")
@@ -309,11 +309,21 @@ make_output_stream <- function(x, filesystem = NULL) {
 filesystem <- fs_and_path$fs
 x <- fs_and_path$path
   }
+
+  if (is.null(compression)) {
+# Infer compression from sink
+compression <- detect_compression(x)
+  }
+
   assert_that(is.string(x))
-  if (is.null(filesystem)) {
-FileOutputStream$create(x)
+  if (is.null(filesystem) && is_compressed(compression)) {
+CompressedOutputStream$create(x) ##compressed local
+  } else if (is.null(filesystem) && !is_compressed(compression)) {
+FileOutputStream$create(x) ## uncompressed local
+  } else if (!is.null(filesystem) && is_compressed(compression)) {
+CompressedOutputStream$create(filesystem$OpenOutputStream(x)) ## 
compressed remote
   } else {
-filesystem$OpenOutputStream(x)
+filesystem$OpenOutputStream(x) ## uncompressed remote
   }
 }
 
@@ -322,6 +332,9 @@ detect_compression <- function(path) {
 return("uncompressed")
   }
 
+  # Remove any trailing slashes, which FileSystem$from_uri may add
+  path <- gsub("/$", "", path)
+
   switch(tools::file_ext(path),
 bz2 = "bz2",
 gz = "gzip",
diff --git a/r/R/util.R b/r/R/util.R
index ff2bb070b8..4aff69e471 100644
--- a/r/R/util.R
+++ b/r/R/util.R
@@ -211,3 +211,7 @@ handle_csv_read_error <- function(e, schema, call) {
   }
   abort(msg, call = call)
 }
+
+is_compressed <- function(compression) {
+  !identical(compression, "uncompressed")
+}
diff --git a/r/tests/testthat/test-csv.R b/r/tests/testthat/test-csv.R
index 631e75fd74..8e463d3abe 100644
--- a/r/tests/testthat/test-csv.R
+++ b/r/tests/testthat/test-csv.R
@@ -564,6 +564,23

[arrow] branch master updated: ARROW-16539: [C++] Bump bundled thrift to 0.16.0

2022-05-13 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 235767d2db ARROW-16539: [C++] Bump bundled thrift to 0.16.0
235767d2db is described below

commit 235767d2dbf1c6839057a21631680e021f3da3e3
Author: Sutou Kouhei 
AuthorDate: Fri May 13 16:02:18 2022 -0400

ARROW-16539: [C++] Bump bundled thrift to 0.16.0

Closes #13122 from nealrichardson/bump-thrift

Lead-authored-by: Sutou Kouhei 
Co-authored-by: Neal Richardson 
Signed-off-by: Neal Richardson 
---
 cpp/cmake_modules/ThirdpartyToolchain.cmake | 40 -
 cpp/thirdparty/versions.txt |  4 +--
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake 
b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index e8fcf33752..992c2102d2 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -1422,21 +1422,24 @@ macro(build_thrift)
   ${EP_COMMON_CMAKE_ARGS}
   "-DCMAKE_INSTALL_PREFIX=${THRIFT_PREFIX}"
   "-DCMAKE_INSTALL_RPATH=${THRIFT_PREFIX}/lib"
+  # Work around https://gitlab.kitware.com/cmake/cmake/issues/18865
+  -DBoost_NO_BOOST_CMAKE=ON
   -DBUILD_COMPILER=OFF
+  -DBUILD_EXAMPLES=OFF
   -DBUILD_SHARED_LIBS=OFF
   -DBUILD_TESTING=OFF
-  -DBUILD_EXAMPLES=OFF
   -DBUILD_TUTORIALS=OFF
-  -DWITH_QT4=OFF
+  -DCMAKE_DEBUG_POSTFIX=
+  -DWITH_AS3=OFF
+  -DWITH_CPP=ON
   -DWITH_C_GLIB=OFF
   -DWITH_JAVA=OFF
-  -DWITH_PYTHON=OFF
-  -DWITH_HASKELL=OFF
-  -DWITH_CPP=ON
-  -DWITH_STATIC_LIB=ON
+  -DWITH_JAVASCRIPT=OFF
   -DWITH_LIBEVENT=OFF
-  # Work around https://gitlab.kitware.com/cmake/cmake/issues/18865
-  -DBoost_NO_BOOST_CMAKE=ON)
+  -DWITH_NODEJS=OFF
+  -DWITH_PYTHON=OFF
+  -DWITH_QT5=OFF
+  -DWITH_ZLIB=OFF)
 
   # Thrift also uses boost. Forward important boost settings if there were 
ones passed.
   if(DEFINED BOOST_ROOT)
@@ -1446,21 +1449,22 @@ macro(build_thrift)
 list(APPEND THRIFT_CMAKE_ARGS "-DBoost_NAMESPACE=${Boost_NAMESPACE}")
   endif()
 
-  set(THRIFT_STATIC_LIB_NAME "${CMAKE_STATIC_LIBRARY_PREFIX}thrift")
   if(MSVC)
 if(ARROW_USE_STATIC_CRT)
-  set(THRIFT_STATIC_LIB_NAME "${THRIFT_STATIC_LIB_NAME}mt")
+  set(THRIFT_LIB_SUFFIX "mt")
   list(APPEND THRIFT_CMAKE_ARGS "-DWITH_MT=ON")
 else()
-  set(THRIFT_STATIC_LIB_NAME "${THRIFT_STATIC_LIB_NAME}md")
+  set(THRIFT_LIB_SUFFIX "md")
   list(APPEND THRIFT_CMAKE_ARGS "-DWITH_MT=OFF")
 endif()
+set(THRIFT_LIB
+
"${THRIFT_PREFIX}/bin/${CMAKE_IMPORT_LIBRARY_PREFIX}thrift${THRIFT_LIB_SUFFIX}${CMAKE_IMPORT_LIBRARY_SUFFIX}"
+)
+  else()
+set(THRIFT_LIB
+
"${THRIFT_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}thrift${CMAKE_STATIC_LIBRARY_SUFFIX}"
+)
   endif()
-  if(${UPPERCASE_BUILD_TYPE} STREQUAL "DEBUG")
-set(THRIFT_STATIC_LIB_NAME "${THRIFT_STATIC_LIB_NAME}d")
-  endif()
-  set(THRIFT_STATIC_LIB
-  
"${THRIFT_PREFIX}/lib/${THRIFT_STATIC_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}")
 
   if(BOOST_VENDORED)
 set(THRIFT_DEPENDENCIES ${THRIFT_DEPENDENCIES} boost_ep)
@@ -1469,7 +1473,7 @@ macro(build_thrift)
   externalproject_add(thrift_ep
   URL ${THRIFT_SOURCE_URL}
   URL_HASH "SHA256=${ARROW_THRIFT_BUILD_SHA256_CHECKSUM}"
-  BUILD_BYPRODUCTS "${THRIFT_STATIC_LIB}"
+  BUILD_BYPRODUCTS "${THRIFT_LIB}"
   CMAKE_ARGS ${THRIFT_CMAKE_ARGS}
   DEPENDS ${THRIFT_DEPENDENCIES} ${EP_LOG_OPTIONS})
 
@@ -1477,7 +1481,7 @@ macro(build_thrift)
   # The include directory must exist before it is referenced by a target.
   file(MAKE_DIRECTORY "${THRIFT_INCLUDE_DIR}")
   set_target_properties(thrift::thrift
-PROPERTIES IMPORTED_LOCATION "${THRIFT_STATIC_LIB}"
+PROPERTIES IMPORTED_LOCATION "${THRIFT_LIB}"
INTERFACE_INCLUDE_DIRECTORIES 
"${THRIFT_INCLUDE_DIR}")
   if(CMAKE_VERSION VERSION_LESS 3.11)
 set_target_properties(${BOOST_LIBRARY} PROPERTIES INTERFACE_LINK_LIBRARIES
diff --git a/cpp/thirdparty/versions.txt b/cpp/thirdparty/versions.txt
index 3aa3ebe90f..776527fc2e 100644
--- a/cpp/thirdparty/versions.txt
+++ b/cpp/thirdparty/versions.txt
@@ -89,8 +89,8 @@ ARROW_SNAPPY_OLD_BUILD_VERSION=1.1.8
 
ARROW_SNAPPY_OLD_BUILD_SHA256_CHECKSUM=16b677f07832a612b0836178db7f374e414f94657c138e6993cbfc5dcc58

[arrow-site] branch asf-site updated: Backfill R news for 8.0.0 release (#214)

2022-05-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f3e8fe1793 Backfill R news for 8.0.0 release (#214)
f3e8fe1793 is described below

commit f3e8fe179397b59f7375560dd69a970d7872e67c
Author: Neal Richardson 
AuthorDate: Thu May 12 17:04:45 2022 -0400

Backfill R news for 8.0.0 release (#214)


https://github.com/apache/arrow/commit/526fa070c82c0e1c6d26a4c1d06a591b37c05011 
apparently did not make it into the release tag
---
 docs/r/news/index.html | 108 +++--
 1 file changed, 96 insertions(+), 12 deletions(-)

diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index 545c27d780..bd4fc2effc 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -128,27 +128,111 @@
 
 
 
-arrow 
7.0.0.9000
+arrow 
8.0.02022-05-09
+
+Enhancements to dplyr and 
datasets
 
-read_csv_arrow()’s 
readr-style type T is now mapped to timestamp(unit = 
"ns") instead of timestamp(unit = "s").
+open_dataset():correctly
 supports the skip argument for skipping header rows in CSV 
datasets.
+can take a list of datasets with differing schemas and attempt to unify 
the schemas to produce a UnionDataset.
+
+Arrow https://dplyr.tidyverse.org; 
class="external-link">dplyr queries:are supported on 
RecordBatchReader. This allows, for example, results from DuckDB 
to be streamed back into Arrow rather than materialized before continuing the 
pipeline.
+no longer need to materialize the entire result table before writing to a 
dataset if the query contains contains aggregations or joins.
+supports https://dplyr.tidyverse.org/reference/rename.html; 
class="external-link">dplyr::rename_with().
+
+https://dplyr.tidyverse.org/reference/count.html; 
class="external-link">dplyr::count() returns an ungrouped 
dataframe.
+
+
+write_dataset has more options for controlling row group and file 
sizes when writing partitioned datasets, such as max_open_files, 
max_rows_per_file, min_rows_per_group, and 
max_rows_per_group.
 
-lubridate:component extraction functions: 
tz() (timezone), semester() (semester), 
dst() (daylight savings time indicator), https://rdrr.io/r/base/date.html; class="external-link">date() 
(extract date), epiyear() (epiyear), improvements to 
month(), which now works with integer inputs.
-Added make_date()  make_datetime() + 
https://rdrr.io/r/base/ISOdatetime.html; 
class="external-link">ISOdatetime()  https://rdrr.io/r/base/ISOdatetime.html; 
class="external-link">ISOdate() to create date-times from numeric 
representations.
-Added decimal_date() and date_decimal()
+write_csv_arrow accepts a Dataset or an Arrow dplyr 
query.
+Joining one or more datasets while option(use_threads = 
FALSE) no longer crashes R. That option is set by default on 
Windows.
+
+dplyr joins support the suffix argument to handle 
overlap in column names.
+Filtering a Parquet dataset with https://rdrr.io/r/base/NA.html; class="external-link">is.na() 
no longer misses any rows.
+
+map_batches() 
correctly accepts Dataset objects.
+
+
+Enhancements to date and 
time support
+
+read_csv_arrow()’s 
readr-style type T is mapped to timestamp(unit = 
"ns") instead of timestamp(unit = "s").
+For Arrow dplyr queries, added additional https://lubridate.tidyverse.org; class="external-link">lubridate 
features and fixes:New component extraction functions:
+https://lubridate.tidyverse.org/reference/tz.html; 
class="external-link">lubridate::tz() (timezone),
+
+https://lubridate.tidyverse.org/reference/quarter.html; 
class="external-link">lubridate::semester(),
+
+https://lubridate.tidyverse.org/reference/dst.html; 
class="external-link">lubridate::dst() (daylight savings time 
boolean),
+
+https://lubridate.tidyverse.org/reference/date.html; 
class="external-link">lubridate::date(),
+
+https://lubridate.tidyverse.org/reference/year.html; 
class="external-link">lubridate::epiyear() (year according to 
epidemiological week calendar),
+
+
+https://lubridate.tidyverse.org/reference/month.html; 
class="external-link">lubridate::month() works with integer 
inputs.
+
+https://lubridate.tidyverse.org/reference/make_datetime.html; 
class="external-link">lubridate::make_date()  https://lubridate.tidyverse.org/reference/make_datetime.html; 
class="external-link">lubridate::make_datetime() + 
lubridate::ISOdatetime()  lubridate::ISOdate() 
to create date-times from numeric representations.
+
+https://lubridate.tidyverse.org/reference/decimal_date.html; 
class="external-link">lubridate::decimal_date() and https://lubridate.tidyverse.org/re

[arrow-site] 01/01: Backfill R news for 8.0.0 release

2022-05-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch nealrichardson-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-site.git

commit 5df23bcc2f4ba19df45306184bf544963867fcda
Author: Neal Richardson 
AuthorDate: Thu May 12 16:36:51 2022 -0400

Backfill R news for 8.0.0 release


https://github.com/apache/arrow/commit/526fa070c82c0e1c6d26a4c1d06a591b37c05011 
apparently did not make it into the release tag
---
 docs/r/news/index.html | 108 +++--
 1 file changed, 96 insertions(+), 12 deletions(-)

diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index 545c27d780..bd4fc2effc 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -128,27 +128,111 @@
 
 
 
-arrow 
7.0.0.9000
+arrow 
8.0.02022-05-09
+
+Enhancements to dplyr and 
datasets
 
-read_csv_arrow()’s 
readr-style type T is now mapped to timestamp(unit = 
"ns") instead of timestamp(unit = "s").
+open_dataset():correctly
 supports the skip argument for skipping header rows in CSV 
datasets.
+can take a list of datasets with differing schemas and attempt to unify 
the schemas to produce a UnionDataset.
+
+Arrow https://dplyr.tidyverse.org; 
class="external-link">dplyr queries:are supported on 
RecordBatchReader. This allows, for example, results from DuckDB 
to be streamed back into Arrow rather than materialized before continuing the 
pipeline.
+no longer need to materialize the entire result table before writing to a 
dataset if the query contains contains aggregations or joins.
+supports https://dplyr.tidyverse.org/reference/rename.html; 
class="external-link">dplyr::rename_with().
+
+https://dplyr.tidyverse.org/reference/count.html; 
class="external-link">dplyr::count() returns an ungrouped 
dataframe.
+
+
+write_dataset has more options for controlling row group and file 
sizes when writing partitioned datasets, such as max_open_files, 
max_rows_per_file, min_rows_per_group, and 
max_rows_per_group.
 
-lubridate:component extraction functions: 
tz() (timezone), semester() (semester), 
dst() (daylight savings time indicator), https://rdrr.io/r/base/date.html; class="external-link">date() 
(extract date), epiyear() (epiyear), improvements to 
month(), which now works with integer inputs.
-Added make_date()  make_datetime() + 
https://rdrr.io/r/base/ISOdatetime.html; 
class="external-link">ISOdatetime()  https://rdrr.io/r/base/ISOdatetime.html; 
class="external-link">ISOdate() to create date-times from numeric 
representations.
-Added decimal_date() and date_decimal()
+write_csv_arrow accepts a Dataset or an Arrow dplyr 
query.
+Joining one or more datasets while option(use_threads = 
FALSE) no longer crashes R. That option is set by default on 
Windows.
+
+dplyr joins support the suffix argument to handle 
overlap in column names.
+Filtering a Parquet dataset with https://rdrr.io/r/base/NA.html; class="external-link">is.na() 
no longer misses any rows.
+
+map_batches() 
correctly accepts Dataset objects.
+
+
+Enhancements to date and 
time support
+
+read_csv_arrow()’s 
readr-style type T is mapped to timestamp(unit = 
"ns") instead of timestamp(unit = "s").
+For Arrow dplyr queries, added additional https://lubridate.tidyverse.org; class="external-link">lubridate 
features and fixes:New component extraction functions:
+https://lubridate.tidyverse.org/reference/tz.html; 
class="external-link">lubridate::tz() (timezone),
+
+https://lubridate.tidyverse.org/reference/quarter.html; 
class="external-link">lubridate::semester(),
+
+https://lubridate.tidyverse.org/reference/dst.html; 
class="external-link">lubridate::dst() (daylight savings time 
boolean),
+
+https://lubridate.tidyverse.org/reference/date.html; 
class="external-link">lubridate::date(),
+
+https://lubridate.tidyverse.org/reference/year.html; 
class="external-link">lubridate::epiyear() (year according to 
epidemiological week calendar),
+
+
+https://lubridate.tidyverse.org/reference/month.html; 
class="external-link">lubridate::month() works with integer 
inputs.
+
+https://lubridate.tidyverse.org/reference/make_datetime.html; 
class="external-link">lubridate::make_date()  https://lubridate.tidyverse.org/reference/make_datetime.html; 
class="external-link">lubridate::make_datetime() + 
lubridate::ISOdatetime()  lubridate::ISOdate() 
to create date-times from numeric representations.
+
+https://lubridate.tidyverse.org/reference/decimal_date.html; 
class="external-link">lubridate::decimal_date() and https://lubridate.tidyverse.org/reference/date_decimal.html; 
class="external-link">lubridate::date_decimal()
+
+
+https://lubridate.tidyverse.org/reference/make_difftime.html; 
clas

[arrow-site] branch nealrichardson-patch-1 created (now 5df23bcc2f)

2022-05-12 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a change to branch nealrichardson-patch-1
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


  at 5df23bcc2f Backfill R news for 8.0.0 release

This branch includes the following new commits:

 new 5df23bcc2f Backfill R news for 8.0.0 release

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[arrow] branch master updated: ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and arrow_available()

2022-05-10 Thread npr
This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
 new 824f58f7df ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and 
arrow_available()
824f58f7df is described below

commit 824f58f7df4043ba41351ee1b75d4293521e1ad8
Author: Neal Richardson 
AuthorDate: Tue May 10 12:48:22 2022 -0400

ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and arrow_available()

The diff looks bigger than that because

* Sometimes those changes just resulted in reducing indentation
* I moved arrow_info() and related functions to their own file, and did the 
same with ArrowObject while I was there
* The way we were wrapping testthat::test_that to check whether arrow was 
available had a side effect of creating a closure that stored intermediate 
objects that we reused across tests, and that broke when I removed it.
* I didn't have styler configured correctly in vscode when I started 
because I had upgraded R to 4.2, so to fix what I had already committed that 
was unstyled, I ran `make style-all` across everything, which reformatted a 
bunch of unrelated code.

I tried to pull on all threads I noticed where we were doing things an 
unnatural way because we couldn't assume that arrow was present, but there may 
be more.

Closes #13086 from nealrichardson/arrow-is-available

Lead-authored-by: Neal Richardson 
Co-authored-by: Jonathan Keane 
Signed-off-by: Neal Richardson 
---
 ci/scripts/r_test.sh  |   6 +-
 dev/tasks/conda-recipes/r-arrow/configure.win |   2 +-
 r/DESCRIPTION |   4 +-
 r/R/array.R   |   4 +-
 r/R/arrow-datum.R |  69 --
 r/R/arrow-info.R  | 185 
 r/R/arrow-object.R|  61 ++
 r/R/arrow-package.R   | 295 ++
 r/R/arrowExports.R|   1 -
 r/R/buffer.R  |   4 +-
 r/R/compression.R |   6 +-
 r/R/compute.R |   8 +-
 r/R/csv.R |   8 +-
 r/R/dataset.R |   2 +-
 r/R/dplyr-datetime-helpers.R  |   7 +-
 r/R/dplyr-funcs-datetime.R|   8 +-
 r/R/dplyr-funcs-string.R  |  16 +-
 r/R/dplyr-funcs-type.R|  12 +-
 r/R/dplyr-funcs.R |  18 +-
 r/R/dplyr-summarize.R |   2 +-
 r/R/extension.R   |  27 +--
 r/R/feather.R |   8 +-
 r/R/field.R   |   4 +-
 r/R/filesystem.R  |   2 +-
 r/R/install-arrow.R   |   2 +-
 r/R/io.R  |  19 +-
 r/R/ipc-stream.R  |   4 +-
 r/R/json.R|   2 +-
 r/R/memory-pool.R |   2 +-
 r/R/message.R |   2 +-
 r/R/parquet.R |   4 +-
 r/R/record-batch-reader.R |   2 +-
 r/R/record-batch-writer.R |   4 +-
 r/R/record-batch.R|  15 +-
 r/R/scalar.R  |   2 +-
 r/R/schema.R  |   4 +-
 r/R/table.R   |   2 +-
 r/R/type.R|  12 +-
 r/_pkgdown.yml|   1 -
 r/configure   |   1 -
 r/configure.win   |   4 +-
 r/data-raw/codegen.R  |  31 +--
 r/man/Field.Rd|   2 -
 r/man/RecordBatchWriter.Rd|   2 -
 r/man/Scalar.Rd   |   2 -
 r/man/arrow_available.Rd  |  50 -
 r/man/arrow_info.Rd   |  32 ++-
 r/man/as_data_type.Rd |   3 +-
 r/man/buffer.Rd   |   2 -
 r/man/call_function.Rd|   2 -
 r/man/codec_is_available.Rd   |   2 -
 r/man/concat_tables.Rd|   2 -
 r/man/data-type.Rd|   2 -
 r/man/infer_type.Rd   |   2 -
 r/man/install_arrow.Rd|   2 +-
 r/man/list_compute_functions.Rd   |   2 -
 r/man/match_arrow.Rd  |   2 -
 r/man/new_extension_type.Rd   |   6 +-
 r/man/read_delim_arrow.Rd |   2 -
 r/man/read_feather.Rd

  1   2   3   4   5   6   7   8   >