[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10035: ARROW-12151: [Docs] Add Jira component + summary conventions to the docs

2021-04-14 Thread GitBox


jorisvandenbossche commented on a change in pull request #10035:
URL: https://github.com/apache/arrow/pull/10035#discussion_r613777042



##
File path: docs/source/developers/contributing.rst
##
@@ -92,7 +92,12 @@ right people see it:
   issue pertains to (for example "Python" or "C++").
 * Also prefix the issue title with the component name in brackets, for example
   ``[Python] issue name`` ; this helps when navigating lists of open issues,
-  and it also makes our changelogs more readable.
+  and it also makes our changelogs more readable. Most prefixes are exactly 
the 
+  same as the **Component** name, with the following exceptions:
+  * **Component:** Continuous Integration — **Summary prefix:** [CI]

Review comment:
   ```suggestion
 same as the **Component** name, with the following exceptions:
   
 * **Component:** Continuous Integration — **Summary prefix:** [CI]
   ```
   
   (rst needs a blank line to separate different levels in the list, I think)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #8648: ARROW-7906: [C++] [Python] Add ORC write support

2021-04-14 Thread GitBox


emkornfield commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-820129581






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] msathis edited a comment on pull request #10026: ARROW-12380: [Rust] [Ballista] Basic scheduler ui

2021-04-14 Thread GitBox


msathis edited a comment on pull request #10026:
URL: https://github.com/apache/arrow/pull/10026#issuecomment-819668534


   https://user-images.githubusercontent.com/2977899/114748895-fc044100-9d6f-11eb-96f2-829a2121d4f3.png;>
   
   This is how it looks right now. @andygrove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #10009: ARROW-11568: [C++][Compute] Rewrite mode kernel

2021-04-14 Thread GitBox


cyb70289 commented on a change in pull request #10009:
URL: https://github.com/apache/arrow/pull/10009#discussion_r613725348



##
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##
@@ -492,16 +493,8 @@ class ArrayCountOrCompareSorter {
   uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end, const 
ArrayType& values,
  int64_t offset, const ArraySortOptions& options) {
 if (values.length() >= countsort_min_len_ && values.length() > 
values.null_count()) {
-  c_type min{std::numeric_limits::max()};
-  c_type max{std::numeric_limits::min()};
-
-  VisitRawValuesInline(
-  values,
-  [&](c_type v) {
-min = std::min(min, v);
-max = std::max(max, v);
-  },
-  []() {});
+  c_type min, max;
+  std::tie(min, max) = GetMinMax(*values.data());

Review comment:
   Small performance improvement for int64narrow sorting.
   ```
  benchmarkbaseline   contender 
 change %
ArraySortIndicesInt64Narrow/32768/2 507.257 MiB/sec 632.995 MiB/sec 
   24.788
   ArraySortIndicesInt64Narrow/32768/10 643.182 MiB/sec 724.483 MiB/sec 
   12.640
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot edited a comment on pull request #9971: [TEST] Revert "ARROW-11475: [C++] Upgrade mimalloc"

2021-04-14 Thread GitBox


ursabot edited a comment on pull request #9971:
URL: https://github.com/apache/arrow/pull/9971#issuecomment-818909439


   Benchmark runs are scheduled for baseline = 
3dc01c52197336e528a1461b07fc93dadee0b49b and contender = 
bd8d9b4a41252f8ec4d1cd3caadd438ab787b8d6. Results will be available as each 
benchmark for each run completes: 
   [Failed] ursa-i9-9960x: 
https://conbench.ursa.dev/compare/runs/2c3beae4-104d-46c6-b5f4-016c8c1cce30...d1338eb0-628e-4a7e-98cd-f37436b2460c/
   [Finished] ursa-thinkcentre-m75q: 
https://conbench.ursa.dev/compare/runs/575d8bb7-8f35-4939-ae15-99254196051c...8165467f-7901-4c8a-afb0-a8a032099a8c/
   [Finished] ec2-t3-large-us-east-2: 
https://conbench.ursa.dev/compare/runs/2e0fbc82-ebaf-44a5-9041-d5d4360d78cd...d0c1ba89-e017-487a-b3f1-d4de66f1f6af/
   [Finished] ec2-t3-xlarge-us-east-2: 
https://conbench.ursa.dev/compare/runs/649fcbe5-ab4f-4b20-8714-89cbd3fab999...079ca892-8338-4b86-b202-18e43df2dbcf/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me commented on a change in pull request #9993: ARROW-12317: [Rust] JSON writer support for time, duration and date

2021-04-14 Thread GitBox


nevi-me commented on a change in pull request #9993:
URL: https://github.com/apache/arrow/pull/9993#discussion_r613735914



##
File path: rust/arrow/src/json/writer.rs
##
@@ -660,6 +765,138 @@ mod tests {
 );
 }
 
+#[test]
+fn write_dates() {
+let ts_string = "2018-11-13T17:11:10.011375885995";
+let ts_millis = ts_string
+.parse::()
+.unwrap()
+.timestamp_millis();
+
+let arr_date32 = Date32Array::from(vec![
+Some(i32::try_from(ts_millis / 1000 / (60 * 60 * 24)).unwrap()),
+None,
+]);
+let arr_date64 = Date64Array::from(vec![Some(ts_millis), None]);
+let arr_names = StringArray::from(vec![Some("a"), Some("b")]);
+
+let schema = Schema::new(vec![
+Field::new("date32", arr_date32.data_type().clone(), false),
+Field::new("date64", arr_date64.data_type().clone(), false),
+Field::new("name", arr_names.data_type().clone(), false),
+]);
+let schema = Arc::new(schema);
+
+let batch = RecordBatch::try_new(
+schema,
+vec![
+Arc::new(arr_date32),
+Arc::new(arr_date64),
+Arc::new(arr_names),
+],
+)
+.unwrap();
+
+let mut buf = Vec::new();
+{
+let mut writer = LineDelimitedWriter::new( buf);
+writer.write_batches(&[batch]).unwrap();
+}
+
+assert_eq!(
+String::from_utf8(buf).unwrap(),
+r#"{"date32":"2018-11-13","date64":"2018-11-13","name":"a"}
+{"name":"b"}
+"#
+);
+}
+
+#[test]
+fn write_times() {
+let arr_time32sec = Time32SecondArray::from(vec![Some(120), None]);
+let arr_time32msec = Time32MillisecondArray::from(vec![Some(120), 
None]);
+let arr_time64usec = Time64MicrosecondArray::from(vec![Some(120), 
None]);
+let arr_time64nsec = Time64NanosecondArray::from(vec![Some(120), 
None]);
+let arr_names = StringArray::from(vec![Some("a"), Some("b")]);
+
+let schema = Schema::new(vec![
+Field::new("time32sec", arr_time32sec.data_type().clone(), false),
+Field::new("time32msec", arr_time32msec.data_type().clone(), 
false),
+Field::new("time64usec", arr_time64usec.data_type().clone(), 
false),
+Field::new("time64nsec", arr_time64nsec.data_type().clone(), 
false),
+Field::new("name", arr_names.data_type().clone(), false),
+]);
+let schema = Arc::new(schema);
+
+let batch = RecordBatch::try_new(
+schema,
+vec![
+Arc::new(arr_time32sec),
+Arc::new(arr_time32msec),
+Arc::new(arr_time64usec),
+Arc::new(arr_time64nsec),
+Arc::new(arr_names),
+],
+)
+.unwrap();
+
+let mut buf = Vec::new();
+{
+let mut writer = LineDelimitedWriter::new( buf);
+writer.write_batches(&[batch]).unwrap();
+}
+
+assert_eq!(
+String::from_utf8(buf).unwrap(),
+
r#"{"time32sec":"00:02:00","time32msec":"00:00:00.120","time64usec":"00:00:00.000120","time64nsec":"00:00:00.00120","name":"a"}
+{"name":"b"}
+"#
+);
+}
+
+#[test]
+fn write_durations() {
+let arr_durationsec = DurationSecondArray::from(vec![Some(120), None]);
+let arr_durationmsec = DurationMillisecondArray::from(vec![Some(120), 
None]);
+let arr_durationusec = DurationMicrosecondArray::from(vec![Some(120), 
None]);
+let arr_durationnsec = DurationNanosecondArray::from(vec![Some(120), 
None]);
+let arr_names = StringArray::from(vec![Some("a"), Some("b")]);
+
+let schema = Schema::new(vec![
+Field::new("duration_sec", arr_durationsec.data_type().clone(), 
false),
+Field::new("duration_msec", arr_durationmsec.data_type().clone(), 
false),
+Field::new("duration_usec", arr_durationusec.data_type().clone(), 
false),
+Field::new("duration_nsec", arr_durationnsec.data_type().clone(), 
false),
+Field::new("name", arr_names.data_type().clone(), false),
+]);
+let schema = Arc::new(schema);
+
+let batch = RecordBatch::try_new(
+schema,
+vec![
+Arc::new(arr_durationsec),
+Arc::new(arr_durationmsec),
+Arc::new(arr_durationusec),
+Arc::new(arr_durationnsec),
+Arc::new(arr_names),
+],
+)
+.unwrap();
+
+let mut buf = Vec::new();
+{
+let mut writer = LineDelimitedWriter::new( buf);
+writer.write_batches(&[batch]).unwrap();
+}
+
+assert_eq!(
+String::from_utf8(buf).unwrap(),
+

[GitHub] [arrow] nevi-me commented on pull request #10043: ARROW-12250: [Rust] [Parquet] Fix failing arrow_writer test

2021-04-14 Thread GitBox


nevi-me commented on pull request #10043:
URL: https://github.com/apache/arrow/pull/10043#issuecomment-820035191


   CC @andygrove @alamb
   
   I should be back around next week to help with PR reviews, and to catch up 
on the project :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10043: ARROW-12250: [Rust] [Parquet] Fix failing arrow_writer test

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10043:
URL: https://github.com/apache/arrow/pull/10043#issuecomment-820034395


   https://issues.apache.org/jira/browse/ARROW-12250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nevi-me opened a new pull request #10043: ARROW-12250: [Rust] [Parquet] Fix failing arrow_writer test

2021-04-14 Thread GitBox


nevi-me opened a new pull request #10043:
URL: https://github.com/apache/arrow/pull/10043


   A copy-paste mistake when creating the FSB test.
   The sporadic failure happens if two tests try to write to the same file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #9155: ARROW-6103: [Release][Java] Remove mvn release plugin

2021-04-14 Thread GitBox


kou commented on pull request #9155:
URL: https://github.com/apache/arrow/pull/9155#issuecomment-820033534


   @andygrove Can we split `mvn deploy` to building artifacts task and 
uploading artifacts task? We want to build artifacts in CI and run only upload 
task on local machine in the future.
   
   For example, Gandiva's jar needs to bundle all of `libgandiva.so` for Linux, 
`libgandiva.dylib` for macOS and `gandiva.dll` for Windows. We can't do it on 
local machine. But we can do it in CI.
   See also: https://issues.apache.org/jira/browse/ARROW-11135
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #10009: ARROW-11568: [C++][Compute] Rewrite mode kernel

2021-04-14 Thread GitBox


cyb70289 commented on a change in pull request #10009:
URL: https://github.com/apache/arrow/pull/10009#discussion_r613727094



##
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##
@@ -31,340 +33,359 @@ namespace internal {
 
 namespace {
 
+using arrow::internal::checked_pointer_cast;
+using arrow::internal::VisitSetBitRunsVoid;
+
+using ModeState = OptionsWrapper;
+
 constexpr char kModeFieldName[] = "mode";
 constexpr char kCountFieldName[] = "count";
 
-// {value:count} map
-template 
-using CounterMap = std::unordered_map;
-
-// map based counter for floating points
-template 
-enable_if_t::value, CounterMap> 
CountValuesByMap(
-const ArrayType& array, int64_t& nan_count) {
-  CounterMap value_counts_map;
-  const ArrayData& data = *array.data();
-  const CType* values = data.GetValues(1);
-
-  nan_count = 0;
-  if (array.length() > array.null_count()) {
-arrow::internal::VisitSetBitRunsVoid(data.buffers[0], data.offset, 
data.length,
- [&](int64_t pos, int64_t len) {
-   for (int64_t i = 0; i < len; ++i) {
- const auto value = values[pos + 
i];
- if (std::isnan(value)) {
-   ++nan_count;
- } else {
-   ++value_counts_map[value];
- }
-   }
- });
+template 
+Result> PrepareOutput(int64_t n, KernelContext* 
ctx,
+  Datum* out) {
+  const auto& mode_type = TypeTraits::type_singleton();
+  const auto& count_type = int64();
+
+  auto mode_data = ArrayData::Make(mode_type, /*length=*/n, /*null_count=*/0);
+  mode_data->buffers.resize(2, nullptr);
+  auto count_data = ArrayData::Make(count_type, n, 0);
+  count_data->buffers.resize(2, nullptr);
+
+  CType* mode_buffer = nullptr;
+  int64_t* count_buffer = nullptr;
+
+  if (n > 0) {
+ARROW_ASSIGN_OR_RAISE(mode_data->buffers[1], ctx->Allocate(n * 
sizeof(CType)));
+ARROW_ASSIGN_OR_RAISE(count_data->buffers[1], ctx->Allocate(n * 
sizeof(int64_t)));
+mode_buffer = mode_data->template GetMutableValues(1);
+count_buffer = count_data->template GetMutableValues(1);
   }
 
-  return value_counts_map;
-}
-
-// map base counter for non floating points
-template 
-enable_if_t::value, CounterMap> 
CountValuesByMap(
-const ArrayType& array) {
-  CounterMap value_counts_map;
-  const ArrayData& data = *array.data();
-  const CType* values = data.GetValues(1);
-
-  if (array.length() > array.null_count()) {
-arrow::internal::VisitSetBitRunsVoid(data.buffers[0], data.offset, 
data.length,
- [&](int64_t pos, int64_t len) {
-   for (int64_t i = 0; i < len; ++i) {
- ++value_counts_map[values[pos + 
i]];
-   }
- });
-  }
+  const auto& out_type =
+  struct_({field(kModeFieldName, mode_type), field(kCountFieldName, 
count_type)});
+  *out = Datum(ArrayData::Make(out_type, n, {nullptr}, {mode_data, 
count_data}, 0));
 
-  return value_counts_map;
+  return std::make_pair(mode_buffer, count_buffer);
 }
 
-// vector based counter for int8 or integers with small value range
-template 
-CounterMap CountValuesByVector(const ArrayType& array, CType min, CType 
max) {
-  const int range = static_cast(max - min);
-  DCHECK(range >= 0 && range < 64 * 1024 * 1024);
-  const ArrayData& data = *array.data();
-  const CType* values = data.GetValues(1);
-
-  std::vector value_counts_vector(range + 1);
-  if (array.length() > array.null_count()) {
-arrow::internal::VisitSetBitRunsVoid(data.buffers[0], data.offset, 
data.length,
- [&](int64_t pos, int64_t len) {
-   for (int64_t i = 0; i < len; ++i) {
- ++value_counts_vector[values[pos 
+ i] - min];
-   }
- });
-  }
-
-  // Transfer value counts to a map to be consistent with other chunks
-  CounterMap value_counts_map(range + 1);
-  for (int i = 0; i <= range; ++i) {
-CType value = static_cast(i + min);
-int64_t count = value_counts_vector[i];
-if (count) {
-  value_counts_map[value] = count;
+// find top-n value:count pairs with minimal heap
+// suboptimal for tiny or large n, possibly okay as we're not in hot path
+template 
+void Finalize(KernelContext* ctx, Datum* out, Generator&& gen) {
+  using CType = typename InType::c_type;
+
+  using ValueCountPair = std::pair;
+  auto gt = [](const 

[GitHub] [arrow] cyb70289 commented on a change in pull request #10009: ARROW-11568: [C++][Compute] Rewrite mode kernel

2021-04-14 Thread GitBox


cyb70289 commented on a change in pull request #10009:
URL: https://github.com/apache/arrow/pull/10009#discussion_r613725348



##
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##
@@ -492,16 +493,8 @@ class ArrayCountOrCompareSorter {
   uint64_t* Sort(uint64_t* indices_begin, uint64_t* indices_end, const 
ArrayType& values,
  int64_t offset, const ArraySortOptions& options) {
 if (values.length() >= countsort_min_len_ && values.length() > 
values.null_count()) {
-  c_type min{std::numeric_limits::max()};
-  c_type max{std::numeric_limits::min()};
-
-  VisitRawValuesInline(
-  values,
-  [&](c_type v) {
-min = std::min(min, v);
-max = std::max(max, v);
-  },
-  []() {});
+  c_type min, max;
+  std::tie(min, max) = GetMinMax(*values.data());

Review comment:
   A bit performance improvement for int64narrow sorting.
   ```
  benchmarkbaseline   contender 
 change %
ArraySortIndicesInt64Narrow/32768/2 507.257 MiB/sec 632.995 MiB/sec 
   24.788
   ArraySortIndicesInt64Narrow/32768/10 643.182 MiB/sec 724.483 MiB/sec 
   12.640
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #10009: ARROW-11568: [C++][Compute] Rewrite mode kernel

2021-04-14 Thread GitBox


cyb70289 commented on a change in pull request #10009:
URL: https://github.com/apache/arrow/pull/10009#discussion_r613724252



##
File path: cpp/src/arrow/util/bit_run_reader.h
##
@@ -480,8 +480,8 @@ Status VisitSetBitRuns(const uint8_t* bitmap, int64_t 
offset, int64_t length,
 }
 
 template 
-void VisitSetBitRunsVoid(const uint8_t* bitmap, int64_t offset, int64_t length,
- Visit&& visit) {
+inline void VisitSetBitRunsVoid(const uint8_t* bitmap, int64_t offset, int64_t 
length,

Review comment:
   Add `inline` hint.
   If the caller is from a cpp source, compiler is willing to inline. But if 
the caller is from a header file, compiler prefers non-inline, though in 
reality it doesn't increase binary size compared with called from source. 
Non-inline causes big perf drop as the visitor becomes a function call and 
cannot be optimized together with the loop.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson closed pull request #9898:
URL: https://github.com/apache/arrow/pull/9898


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613714655



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,520 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that

Review comment:
   (This is lame and we could do a lot better, but at least we should be 
factually accurate.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613714530



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,520 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that

Review comment:
   ```suggestion
   On Windows and Linux, you can download a .zip file with the arrow 
dependencies from the
   nightly repository.
   Windows users then can set the `RWINLIB_LOCAL` environment variable to point 
to that
   zip file before installing the `arrow` R package. On Linux, you'll need to 
create a `libarrow` directory inside the R package directory and unzip that 
file into it. Version numbers in that
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613713875



##
File path: r/vignettes/install.Rmd
##
@@ -335,6 +286,16 @@ See discussion 
[here](https://issues.apache.org/jira/browse/ARROW-8556).
 
 ## Summary of build environment variables
 
+Some features are optional when you build Arrow from source
+
+* `ARROW_S3`: If set to `true` S3 support will be built (as long as the 
+  dependencies are met, if they are not met, this may be automatically turned 
off) (default: false)
+* `ARROW_JEMALLOC`: If set to `true` jemalloc support will be included 
(default: true)
+* `ARROW_PARQUET`: If set to `true` parquet support will be included (default: 
true)
+* `ARROW_DATASET`: If set to `true` dataset support will be included (default: 
true)
+* `ARROW_WITH_RE2`: If set to `true` RE2 regular expression library support 
will be included (default: true)
+* `ARROW_WITH_UTF8PROC`: If set to `true` UTF8Proc string library support will 
be included (default: true)
+
 By default, these are all unset. All boolean variables are case-insensitive.

Review comment:
   ```suggestion
   There are a number of other variables that affect the `configure` script and 
the bundled build script.
   By default, these are all unset. All boolean variables are case-insensitive.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613713587



##
File path: r/vignettes/install.Rmd
##
@@ -335,6 +286,16 @@ See discussion 
[here](https://issues.apache.org/jira/browse/ARROW-8556).
 
 ## Summary of build environment variables
 
+Some features are optional when you build Arrow from source
+
+* `ARROW_S3`: If set to `true` S3 support will be built (as long as the 
+  dependencies are met, if they are not met, this may be automatically turned 
off) (default: false)
+* `ARROW_JEMALLOC`: If set to `true` jemalloc support will be included 
(default: true)
+* `ARROW_PARQUET`: If set to `true` parquet support will be included (default: 
true)
+* `ARROW_DATASET`: If set to `true` dataset support will be included (default: 
true)
+* `ARROW_WITH_RE2`: If set to `true` RE2 regular expression library support 
will be included (default: true)
+* `ARROW_WITH_UTF8PROC`: If set to `true` UTF8Proc string library support will 
be included (default: true)

Review comment:
   ```suggestion
   * `ARROW_S3`: If set to `ON` S3 support will be built as long as the 
 dependencies are met; if they are not met, the build script will turn this 
`OFF` 
   * `ARROW_JEMALLOC` for the `jemalloc` memory allocator
   * `ARROW_PARQUET`
   * `ARROW_DATASET`
   * `ARROW_WITH_RE2` for the RE2 regular expression library, used in some 
string compute functions
   * `ARROW_WITH_UTF8PROC` for the UTF8Proc string library, used in many other 
string compute functions
   ```

##
File path: r/vignettes/install.Rmd
##
@@ -335,6 +286,16 @@ See discussion 
[here](https://issues.apache.org/jira/browse/ARROW-8556).
 
 ## Summary of build environment variables
 
+Some features are optional when you build Arrow from source

Review comment:
   ```suggestion
   Some features are optional when you build Arrow from source. With the 
exception of `ARROW_S3`, these are all `ON` by default in the bundled C++ 
build, but you can set them to `OFF` to disable them.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] edponce commented on a change in pull request #10016: ARROW-11950: [C++][Compute] Add unary negative kernel

2021-04-14 Thread GitBox


edponce commented on a change in pull request #10016:
URL: https://github.com/apache/arrow/pull/10016#discussion_r613703669



##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##
@@ -309,6 +348,21 @@ std::shared_ptr 
MakeArithmeticFunctionNotNull(std::string name,
   return func;
 }
 

Review comment:
   Not sure why we need UnaryScalarFunction and can't use ScalarFunction as 
is. Why the *CommonNumeric* is using *int8()*?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] edponce commented on a change in pull request #10016: ARROW-11950: [C++][Compute] Add unary negative kernel

2021-04-14 Thread GitBox


edponce commented on a change in pull request #10016:
URL: https://github.com/apache/arrow/pull/10016#discussion_r613702980



##
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc
##
@@ -709,5 +709,76 @@ TEST(TestBinaryArithmetic, 
AddWithImplicitCastsUint64EdgeCase) {
  ArrayFromJSON(uint64(), 
"[18446744073709551615]")}));
 }
 
+TEST(TestUnaryArithmeticSigned, Negate) {
+  for (const auto& ty : internal::SignedIntTypes()) {
+// No input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[]"), ArrayFromJSON(ty, 
"[]"));
+// Null input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, 
"[null]"));
+// Zeros as inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), 
ArrayFromJSON(ty, "[0, -0, 0]"));
+// Positive inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[1, 10, 100]"), 
ArrayFromJSON(ty, "[-1, -10, -100]"));
+// Negative inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1, -10, -100]"), 
ArrayFromJSON(ty, "[1, 10, 100]"));
+  }
+}
+
+TEST(TestUnaryArithmeticSignedMinMax, Negate) {
+  // NOTE [EPM]: Can these tests be done by iterating types?
+
+  // Min input
+  // Out-of-bounds after operation (C++ 2's complement wrap around, 
architecture dependent)
+  auto int8_min = std::numeric_limits::min();
+  CheckScalarUnary("negate", MakeScalar(int8_min), MakeScalar(int8_min));
+  auto int16_min = std::numeric_limits::min();
+  CheckScalarUnary("negate", MakeScalar(int16_min), MakeScalar(int16_min));
+  auto int32_min = std::numeric_limits::min();
+  CheckScalarUnary("negate", MakeScalar(int32_min), MakeScalar(int32_min));
+  auto int64_min = std::numeric_limits::min();
+  CheckScalarUnary("negate", MakeScalar(int64_min), MakeScalar(int64_min));
+
+  // Max input
+  // NOTE [EPM]: Why do these fail? The expected result is promoted to int32.
+  // auto int8_max = std::numeric_limits::max();
+  // CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));
+  // auto int16_max = std::numeric_limits::max();
+  // CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));
+  auto int32_max = std::numeric_limits::max();
+  CheckScalarUnary("negate", MakeScalar(int32_max), MakeScalar(-int32_max));
+  auto int64_max = std::numeric_limits::max();
+  CheckScalarUnary("negate", MakeScalar(int64_max), MakeScalar(-int64_max));
+}
+
+TEST(TestUnaryArithmeticUnsigned, Negate) {
+  for (const auto& ty : internal::UnsignedIntTypes()) {
+// No input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[]"), ArrayFromJSON(ty, 
"[]"));
+// Null input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, 
"[null]"));
+// Zeros as inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[0]"), ArrayFromJSON(ty, 
"[0]"));
+  }
+// Positive inputs
+// CheckScalarUnary("negate", ArrayFromJSON(ty, "[1, 10, 100]"), 
ArrayFromJSON(ty, "[-1, -10, -100]"));
+// Negative inputs
+// CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1, -10, -100]"), 
ArrayFromJSON(ty, "[1, 10, 100]"));
+}
+
+TEST(TestUnaryArithmeticFloating, Negate) {
+  for (const auto& ty : internal::FloatingPointTypes()) {
+// No input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[]"), ArrayFromJSON(ty, 
"[]"));
+// Null input
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[null]"), ArrayFromJSON(ty, 
"[null]"));
+// Zeros as inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[0.0, 0.0, -0.0]"), 
ArrayFromJSON(ty, "[0.0, -0.0, 0.0]"));
+// Positive inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"), 
ArrayFromJSON(ty, "[-1.3, -10.80, -12748.001]"));
+// Negative inputs
+CheckScalarUnary("negate", ArrayFromJSON(ty, "[-1.3, -10.80, 
-12748.001]"), ArrayFromJSON(ty, "[1.3, 10.80, 12748.001]"));

Review comment:
   Good corner cases, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] andygrove commented on pull request #9991: ARROW-12335: [Rust] [Ballista] Use latest DataFusion

2021-04-14 Thread GitBox


andygrove commented on pull request #9991:
URL: https://github.com/apache/arrow/pull/9991#issuecomment-819957641


   Java build failed, unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #10027: ARROW-12381: [Packaging][Python] macOS wheels are built with wrong package kind

2021-04-14 Thread GitBox


kou closed pull request #10027:
URL: https://github.com/apache/arrow/pull/10027


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] westonpace commented on pull request #9589: ARROW-11797: [C++][Dataset] Provide batch stream Scanner methods

2021-04-14 Thread GitBox


westonpace commented on pull request #9589:
URL: https://github.com/apache/arrow/pull/9589#issuecomment-819954135


   FYI, there are probably some things we could do to improve the JNI build 
(ARROW-11633)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm closed pull request #9589: ARROW-11797: [C++][Dataset] Provide batch stream Scanner methods

2021-04-14 Thread GitBox


lidavidm closed pull request #9589:
URL: https://github.com/apache/arrow/pull/9589


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #9589: ARROW-11797: [C++][Dataset] Provide batch stream Scanner methods

2021-04-14 Thread GitBox


lidavidm commented on pull request #9589:
URL: https://github.com/apache/arrow/pull/9589#issuecomment-819947024


   I am not sure why the JNI test is having so much trouble but it passes 
locally under Docker.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rok commented on pull request #9758: ARROW-9054: [C++] Add ScalarAggregateOptions

2021-04-14 Thread GitBox


rok commented on pull request #9758:
URL: https://github.com/apache/arrow/pull/9758#issuecomment-819944801


   This is slowly coming together. Remaining todo:
   * min/max must cover R edge cases
   * sum - don't promote type (e.g. int32 to int64) if possible
   * sum / mean - should short circuit when it becomes impossible for non-null 
count to be below min_count.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #10037: ARROW-12376: [Dev] Log traceback for unexpected exceptions in archery trigger-bot

2021-04-14 Thread GitBox


kou closed pull request #10037:
URL: https://github.com/apache/arrow/pull/10037


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10027: ARROW-12381: [Packaging][Python] macOS wheels are built with wrong package kind

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10027:
URL: https://github.com/apache/arrow/pull/10027#issuecomment-819931553


   Revision: b9e556fdaf7a292156d99f1d5253c9221b4a8c09
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-321](https://github.com/ursacomputing/crossbow/branches/all?query=actions-321)
   
   |Task|Status|
   ||--|
   |wheel-manylinux2010-cp36-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2010-cp36-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2010-cp36-amd64)|
   |wheel-manylinux2010-cp37-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2010-cp37-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2010-cp37-amd64)|
   |wheel-manylinux2010-cp38-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2010-cp38-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2010-cp38-amd64)|
   |wheel-manylinux2010-cp39-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2010-cp39-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2010-cp39-amd64)|
   |wheel-manylinux2014-cp36-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2014-cp36-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2014-cp36-amd64)|
   
|wheel-manylinux2014-cp36-arm64|[![TravisCI](https://img.shields.io/travis/ursacomputing/crossbow/actions-321-travis-wheel-manylinux2014-cp36-arm64.svg)](https://travis-ci.com/ursacomputing/crossbow/branches)|
   |wheel-manylinux2014-cp37-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2014-cp37-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2014-cp37-amd64)|
   
|wheel-manylinux2014-cp37-arm64|[![TravisCI](https://img.shields.io/travis/ursacomputing/crossbow/actions-321-travis-wheel-manylinux2014-cp37-arm64.svg)](https://travis-ci.com/ursacomputing/crossbow/branches)|
   |wheel-manylinux2014-cp38-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2014-cp38-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2014-cp38-amd64)|
   
|wheel-manylinux2014-cp38-arm64|[![TravisCI](https://img.shields.io/travis/ursacomputing/crossbow/actions-321-travis-wheel-manylinux2014-cp38-arm64.svg)](https://travis-ci.com/ursacomputing/crossbow/branches)|
   |wheel-manylinux2014-cp39-amd64|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-manylinux2014-cp39-amd64)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-manylinux2014-cp39-amd64)|
   
|wheel-manylinux2014-cp39-arm64|[![TravisCI](https://img.shields.io/travis/ursacomputing/crossbow/actions-321-travis-wheel-manylinux2014-cp39-arm64.svg)](https://travis-ci.com/ursacomputing/crossbow/branches)|
   |wheel-osx-high-sierra-cp36|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-osx-high-sierra-cp36)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-osx-high-sierra-cp36)|
   |wheel-osx-high-sierra-cp37|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-osx-high-sierra-cp37)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-osx-high-sierra-cp37)|
   |wheel-osx-high-sierra-cp38|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-osx-high-sierra-cp38)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-osx-high-sierra-cp38)|
   |wheel-osx-high-sierra-cp39|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-321-github-wheel-osx-high-sierra-cp39)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-321-github-wheel-osx-high-sierra-cp39)|
   |wheel-osx-mavericks-cp36|[![Github 

[GitHub] [arrow] kou commented on pull request #10027: ARROW-12381: [Packaging][Python] macOS wheels are built with wrong package kind

2021-04-14 Thread GitBox


kou commented on pull request #10027:
URL: https://github.com/apache/arrow/pull/10027#issuecomment-819931191


   @github-actions crossbow submit -g wheel


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #10036: ARROW-12273: [JS] [Rust] Remove coveralls

2021-04-14 Thread GitBox


kou closed pull request #10036:
URL: https://github.com/apache/arrow/pull/10036


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


nealrichardson closed pull request #10014:
URL: https://github.com/apache/arrow/pull/10014


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #10020: ARROW-12370: [R] Bindings for power kernel

2021-04-14 Thread GitBox


nealrichardson closed pull request #10020:
URL: https://github.com/apache/arrow/pull/10020


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #10020: ARROW-12370: [R] Bindings for power kernel

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #10020:
URL: https://github.com/apache/arrow/pull/10020#discussion_r613651376



##
File path: r/tests/testthat/test-dplyr-filter.R
##
@@ -284,7 +300,7 @@ test_that("filter environment scope", {
   # Also for functions
   # 'could not find function "isEqualTo"' because we haven't defined it yet
   expect_dplyr_error(input %>% filter(isEqualTo(int, 4)), tbl)
-  
+

Review comment:
   ```suggestion
   ```

##
File path: r/tests/testthat/test-dplyr-filter.R
##
@@ -399,5 +415,5 @@ test_that("filter() with .data pronoun", {
   collect(),
 tbl
   )
-  
+

Review comment:
   ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10042: ARROW-12395: Create RunInSerialExecutor benchmark

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10042:
URL: https://github.com/apache/arrow/pull/10042#issuecomment-819916433


   https://issues.apache.org/jira/browse/ARROW-12395


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #9950: ARROW-11468: [R] Allow user to pass schema to read_json_arrow()

2021-04-14 Thread GitBox


nealrichardson closed pull request #9950:
URL: https://github.com/apache/arrow/pull/9950


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tifflhl commented on pull request #9368: [WIP] [POC] Flight SQL

2021-04-14 Thread GitBox


tifflhl commented on pull request #9368:
URL: https://github.com/apache/arrow/pull/9368#issuecomment-819915361


   Hi David,
   
   Our resources are a bit limited right now and can't really say if we would
   have the bandwidth to work on flight-sql for v5.0.0. When we have some more
   clearance on when the work can resume, we will definitely notify the dev
   mailing list.
   
   Cheers,
   Tiffany
   
   On Wed, Apr 14, 2021 at 11:03 AM David Li ***@***.***> wrote:
   
   > Following up here @tifflhl , is there
   > interest in trying to get this in for 5.0.0 perhaps?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   >
   
   
   -- 
   Tiffany Lam (She/Her)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] westonpace opened a new pull request #10042: ARROW-12395: Create RunInSerialExecutor benchmark

2021-04-14 Thread GitBox


westonpace opened a new pull request #10042:
URL: https://github.com/apache/arrow/pull/10042


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613644003



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613643774



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] github-actions[bot] commented on pull request #10041: ARROW-12394: [Release] Upload binaries to S3 instead of Bintray temporary

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10041:
URL: https://github.com/apache/arrow/pull/10041#issuecomment-819906261


   https://issues.apache.org/jira/browse/ARROW-12394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10040: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10040:
URL: https://github.com/apache/arrow/pull/10040#issuecomment-819904003


   https://issues.apache.org/jira/browse/ARROW-11565


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10039: ARROW-12390: [Rust] Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10039:
URL: https://github.com/apache/arrow/pull/10039#issuecomment-819901892


   https://issues.apache.org/jira/browse/ARROW-12390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10037: ARROW-12376: [Dev] Log traceback for unexpected exceptions in archery trigger-bot

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10037:
URL: https://github.com/apache/arrow/pull/10037#issuecomment-819899729


   https://issues.apache.org/jira/browse/ARROW-12376


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10036: ARROW-12273: [JS] [Rust] Remove coveralls

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10036:
URL: https://github.com/apache/arrow/pull/10036#issuecomment-819899257


   https://issues.apache.org/jira/browse/ARROW-12273


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10035: ARROW-12151: [Docs] Add Jira component + summary conventions to the docs

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10035:
URL: https://github.com/apache/arrow/pull/10035#issuecomment-819898030


   https://issues.apache.org/jira/browse/ARROW-12151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10030: ARROW-12384: [JS] Use let/const and clean up eslint rules

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10030:
URL: https://github.com/apache/arrow/pull/10030#issuecomment-819897636


   https://issues.apache.org/jira/browse/ARROW-12384


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10034: ARROW-12389: [R] [Docs] Add note about autocasting

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10034:
URL: https://github.com/apache/arrow/pull/10034#issuecomment-819895890


   https://issues.apache.org/jira/browse/ARROW-12389


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10033: ARROW-12388: [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10033:
URL: https://github.com/apache/arrow/pull/10033#issuecomment-819893948


   https://issues.apache.org/jira/browse/ARROW-12388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm closed pull request #10029: ARROW-12382: [C++] Bundle xsimd if runtime SIMD level is set

2021-04-14 Thread GitBox


lidavidm closed pull request #10029:
URL: https://github.com/apache/arrow/pull/10029


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #10029: ARROW-12382: [C++] Bundle xsimd if runtime SIMD level is set

2021-04-14 Thread GitBox


lidavidm commented on pull request #10029:
URL: https://github.com/apache/arrow/pull/10029#issuecomment-819891004


   Crossbow passes, CI passes (R failure is fixed on master), merging now and 
hopefully the build report tomorrow will be more cheerful!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613626481



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613625886



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613623956



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] github-actions[bot] commented on pull request #10032: ARROW-12185: [R] Bindings for any, all

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#issuecomment-819885621


   https://issues.apache.org/jira/browse/ARROW-12185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613623146



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613620020



##
File path: r/vignettes/install.Rmd
##
@@ -379,6 +330,13 @@ By default, these are all unset. All boolean variables are 
case-insensitive.
   The directory will be created if it does not exist.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the 
`$PATH`
+* `ARROW_S3`: If set to `true` S3 support will be built (as long as the 

Review comment:
   I also moved them up to the top of this section, I think these are 
actually slightly more interesting / important than the other group




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #9876: ARROW-12188: [Docs] Switch to pydata-sphinx-theme for the main sphinx docs

2021-04-14 Thread GitBox


jorisvandenbossche commented on pull request #9876:
URL: https://github.com/apache/arrow/pull/9876#issuecomment-819881813


   @github-actions crossbow submit test-ubuntu-18.04-docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613615775



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613615098



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613614035



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] lidavidm closed pull request #10019: ARROW-12161: [C++][Dataset] Revert async CSV reader in datasets

2021-04-14 Thread GitBox


lidavidm closed pull request #10019:
URL: https://github.com/apache/arrow/pull/10019


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #10019: ARROW-12161: [C++][Dataset] Revert async CSV reader in datasets

2021-04-14 Thread GitBox


lidavidm commented on pull request #10019:
URL: https://github.com/apache/arrow/pull/10019#issuecomment-819874679


   Ok, I don't think the Travis queue is clearing anytime soon. 
https://github.com/lidavidm/arrow/tree/arrow-11797 which is this + ARROW-11797 
passes Actions/Travis/AppVeyor, so I'll merge this and rebase 11797.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613610986



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] velvia commented on a change in pull request #10005: ARROW-12232: [Rust][DataFusion] Add SQL support for CAST(.. AS Time) millisecond

2021-04-14 Thread GitBox


velvia commented on a change in pull request #10005:
URL: https://github.com/apache/arrow/pull/10005#discussion_r613610417



##
File path: rust/datafusion/src/sql/planner.rs
##
@@ -1498,6 +1498,7 @@ pub fn convert_data_type(sql: ) -> 
Result {
 SQLDataType::Double => Ok(DataType::Float64),
 SQLDataType::Char(_) | SQLDataType::Varchar(_) => Ok(DataType::Utf8),
 SQLDataType::Timestamp => Ok(DataType::Timestamp(TimeUnit::Nanosecond, 
None)),
+SQLDataType::Time => Ok(DataType::Timestamp(TimeUnit::Millisecond, 
None)),

Review comment:
   Oh, there is also a easier solution.  Expand `return_type()` method in 
functions.rs to allow passing in the input arrays, so that we could return a 
different type depending on the time granularity argument.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10031: ARROW-12385: [R] [CI] fix cran picking in CI

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10031:
URL: https://github.com/apache/arrow/pull/10031#issuecomment-819871592


   https://issues.apache.org/jira/browse/ARROW-12385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] velvia commented on a change in pull request #10005: ARROW-12232: [Rust][DataFusion] Add SQL support for CAST(.. AS Time) millisecond

2021-04-14 Thread GitBox


velvia commented on a change in pull request #10005:
URL: https://github.com/apache/arrow/pull/10005#discussion_r613610055



##
File path: rust/datafusion/src/sql/planner.rs
##
@@ -1498,6 +1498,7 @@ pub fn convert_data_type(sql: ) -> 
Result {
 SQLDataType::Double => Ok(DataType::Float64),
 SQLDataType::Char(_) | SQLDataType::Varchar(_) => Ok(DataType::Utf8),
 SQLDataType::Timestamp => Ok(DataType::Timestamp(TimeUnit::Nanosecond, 
None)),
+SQLDataType::Time => Ok(DataType::Timestamp(TimeUnit::Millisecond, 
None)),

Review comment:
   @alamb and others: so I found that there is already an existing 
`date_trunc()` function, which always returns Timestamp(Nanos).  The problem 
with the above proposal I put up is:
   * `return_type()` in functions.rs is designed to give the return type 
knowing only the input argument types.  IOW, I cannot produce a different type 
such as Timestamp(Millis) as I cannot inspect the time granularity value, all I 
know is that I have args of type utf8 and timestamp(nanos).  
   * The current function signature is `Exact` and always takes in inputs of 
nanosecond timestamp resolution.  I suppose we can keep that... but should 
think about what do people do when they have different timestamp resolutions.
   
   Basically, there is a larger issue in Arrow/Datafusion, in that Timestamp 
columns with different time resolutions are considered different types, and 
most of the infrastructure is designed for or works best when there is a single 
output type (and input type), it is much simpler.   Also in the SQL world, for 
the most part there is just a "Timestamp" type.  `date_trunc()` and other 
things have been designed to work with nanos, but you can in theory have 
timestamp columns which have millis, micros, seconds, but they don't have the 
same support that nanos do.
   
   So the big question (and I'm not sure if this PR is the right forum for 
this) is which direction should the project be going.  There is a mismatch 
between SQL support, the design to work with one type, and the Arrow support 
for multiple time resolutions.
   
   I can think of some possible directions/solutions, in no particular order:
   - Have most functions work on nano resolution timestamps, and have automatic 
coercion of other resolutions to nanos when calling functions etc.  This 
imposes a perf penalty, but maybe an acceptable one.
   - Add support for specifying the resolution for timestamps.  For example, 
`CAST(xx AS TIMESTAMP(Microseconds))` would be possible.  There's probably 
precent for this somewhere.   This would need the previous coercion though to 
work.
   - Somehow support just a single timestamp type in Arrow, with multiple 
possible resolutions.   I'm not sure how this would even be represented or 
implemented, and it'd be a huge change.  It might match better how things are 
implemented in other parts of the SQL world though.
   
   Would love everyone's thoughts on this.   For now I'd be super happy with 
the first two solutions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jonkeane commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


jonkeane commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613609989



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ library 

[GitHub] [arrow] emkornfield commented on issue #1217: How to write data with struct with pyarrow.parquet ?

2021-04-14 Thread GitBox


emkornfield commented on issue #1217:
URL: https://github.com/apache/arrow/issues/1217#issuecomment-819866905


   It looks like you have the args reversed: 
https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow.StructArray.from_arrays


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #10041: ARROW-12394: [Release] Upload binaries to S3 instead of Bintray temporary

2021-04-14 Thread GitBox


kou opened a new pull request #10041:
URL: https://github.com/apache/arrow/pull/10041


   We should upload to Artifactory but it's not ready yet.
   
   We'll use S3 as a temporary upload location only for 4.0.0. We'll
   rewrite this change once Artifactory is ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10030: ARROW-12384: [JS] Use let/const and clean up eslint rules

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10030:
URL: https://github.com/apache/arrow/pull/10030#issuecomment-819866071


   https://issues.apache.org/jira/browse/ARROW-12383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #10029: ARROW-12382: [C++] Bundle xsimd if runtime SIMD level is set

2021-04-14 Thread GitBox


github-actions[bot] commented on pull request #10029:
URL: https://github.com/apache/arrow/pull/10029#issuecomment-819866015


   Revision: 8721f02c379ad4c203eec41b92f54a5afad108ee
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-320](https://github.com/ursacomputing/crossbow/branches/all?query=actions-320)
   
   |Task|Status|
   ||--|
   
|conda-linux-gcc-py38-cpu|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-320-azure-conda-linux-gcc-py38-cpu)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1=actions-320-azure-conda-linux-gcc-py38-cpu)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] Balive13 commented on issue #1217: How to write data with struct with pyarrow.parquet ?

2021-04-14 Thread GitBox


Balive13 commented on issue #1217:
URL: https://github.com/apache/arrow/issues/1217#issuecomment-819865523


   The error is related to "arr = pa.StructArray.from_arrays(['key', 'value'], 
fields)".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on issue #1217: How to write data with struct with pyarrow.parquet ?

2021-04-14 Thread GitBox


emkornfield commented on issue #1217:
URL: https://github.com/apache/arrow/issues/1217#issuecomment-819859799


   you probably want `print(str(arr))`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


ianmcook commented on a change in pull request #10014:
URL: https://github.com/apache/arrow/pull/10014#discussion_r613599475



##
File path: r/README.md
##
@@ -4,250 +4,283 @@
 
[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amaster+event%3Apush)
 
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
 
-[Apache Arrow](https://arrow.apache.org/) is a cross-language
-development platform for in-memory data. It specifies a standardized
+**[Apache Arrow](https://arrow.apache.org/) is a cross-language
+development platform for in-memory data.** It specifies a standardized
 language-independent columnar memory format for flat and hierarchical
 data, organized for efficient analytic operations on modern hardware. It
 also provides computational libraries and zero-copy streaming messaging
 and interprocess communication.
 
-The `arrow` package exposes an interface to the Arrow C++ library to
-access many of its features in R. This includes support for analyzing
-large, multi-file datasets (`open_dataset()`), working with individual
-Parquet (`read_parquet()`, `write_parquet()`) and Feather
-(`read_feather()`, `write_feather()`) files, as well as lower-level
-access to Arrow memory and messages.
+**The `arrow` package exposes an interface to the Arrow C++ library,
+enabling access to many of its features in R.** It provides low-level
+access to the Arrow C++ library API and higher-level access through a
+`dplyr` backend and familiar R functions.
+
+## What can the `arrow` package do?
+
+-   Read and write **Parquet files** (`read_parquet()`,
+`write_parquet()`), an efficient and widely used columnar format
+-   Read and write **Feather files** (`read_feather()`,
+`write_feather()`), a format optimized for speed and
+interoperability

Review comment:
   I added some more detail on Parquet further down. I think for now it 
would be best not to add more links or detail in this section.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #10014:
URL: https://github.com/apache/arrow/pull/10014#discussion_r613597360



##
File path: r/vignettes/arrow.Rmd
##
@@ -72,6 +72,45 @@ to other applications and services that use Arrow. One 
example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
+# Object hierarchy
+
+## Metadata objects

Review comment:
   Re: forward references, I disagree that this is a virtue, I think it's 
more readable to start with the thing you care about and recurse down into the 
details as needed. But agreed that we don't have to resolve this today.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


ianmcook commented on pull request #10014:
URL: https://github.com/apache/arrow/pull/10014#issuecomment-819842109


   > I’ve read through the readme + dataset vignettes and these are both great 
and improvements over what we had. I'm torn about this next suggestion so feel 
free to ignore it, but would it be helpful (or more distracting?) to have 
one-line examples in that first group of bullets in the readme to show how easy 
each (or most) of those actions are? They would be kind of floating without 
much context, but it might help get people in and excited to see that it really 
is just `read_parquet("big_file.parquet")` to get started with parquets.
   
   I think for this version, we should keep the bullets very high-level and 
easy for folks to quickly read through. Let's table this for later 
consideration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


ianmcook commented on a change in pull request #10014:
URL: https://github.com/apache/arrow/pull/10014#discussion_r613596152



##
File path: r/README.md
##
@@ -4,250 +4,283 @@
 
[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amaster+event%3Apush)
 
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
 
-[Apache Arrow](https://arrow.apache.org/) is a cross-language
-development platform for in-memory data. It specifies a standardized
+**[Apache Arrow](https://arrow.apache.org/) is a cross-language
+development platform for in-memory data.** It specifies a standardized
 language-independent columnar memory format for flat and hierarchical
 data, organized for efficient analytic operations on modern hardware. It
 also provides computational libraries and zero-copy streaming messaging
 and interprocess communication.
 
-The `arrow` package exposes an interface to the Arrow C++ library to
-access many of its features in R. This includes support for analyzing
-large, multi-file datasets (`open_dataset()`), working with individual
-Parquet (`read_parquet()`, `write_parquet()`) and Feather
-(`read_feather()`, `write_feather()`) files, as well as lower-level
-access to Arrow memory and messages.
+**The `arrow` package exposes an interface to the Arrow C++ library,
+enabling access to many of its features in R.** It provides low-level
+access to the Arrow C++ library API and higher-level access through a
+`dplyr` backend and familiar R functions.
+
+## What can the `arrow` package do?
+
+-   Read and write **Parquet files** (`read_parquet()`,
+`write_parquet()`), an efficient and widely used columnar format
+-   Read and write **Feather files** (`read_feather()`,
+`write_feather()`), a format optimized for speed and
+interoperability
+-   Open or write **large, multi-file datasets** with a single function
+call (`open_dataset()`, `write_dataset()`)
+-   Read **large CSV and JSON files** with excellent **speed and

Review comment:
   I don't really think this is the right place for links; this is mean to 
be a fairly breezy list of key features, with elaboration provided below. I'd 
prefer to table this idea for later

##
File path: r/README.md
##
@@ -4,250 +4,283 @@
 
[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amaster+event%3Apush)
 
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
 
-[Apache Arrow](https://arrow.apache.org/) is a cross-language
-development platform for in-memory data. It specifies a standardized
+**[Apache Arrow](https://arrow.apache.org/) is a cross-language
+development platform for in-memory data.** It specifies a standardized
 language-independent columnar memory format for flat and hierarchical
 data, organized for efficient analytic operations on modern hardware. It
 also provides computational libraries and zero-copy streaming messaging
 and interprocess communication.
 
-The `arrow` package exposes an interface to the Arrow C++ library to
-access many of its features in R. This includes support for analyzing
-large, multi-file datasets (`open_dataset()`), working with individual
-Parquet (`read_parquet()`, `write_parquet()`) and Feather
-(`read_feather()`, `write_feather()`) files, as well as lower-level
-access to Arrow memory and messages.
+**The `arrow` package exposes an interface to the Arrow C++ library,
+enabling access to many of its features in R.** It provides low-level
+access to the Arrow C++ library API and higher-level access through a
+`dplyr` backend and familiar R functions.
+
+## What can the `arrow` package do?
+
+-   Read and write **Parquet files** (`read_parquet()`,
+`write_parquet()`), an efficient and widely used columnar format
+-   Read and write **Feather files** (`read_feather()`,
+`write_feather()`), a format optimized for speed and
+interoperability
+-   Open or write **large, multi-file datasets** with a single function
+call (`open_dataset()`, `write_dataset()`)
+-   Read **large CSV and JSON files** with excellent **speed and

Review comment:
   I don't really think this is the right place for links; this is meant to 
be a fairly breezy list of key features, with elaboration provided below. I'd 
prefer to table this idea for later




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613596055



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613594298



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613593041



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] RezaeiNasab commented on issue #1217: How to write data with struct with pyarrow.parquet ?

2021-04-14 Thread GitBox


RezaeiNasab commented on issue #1217:
URL: https://github.com/apache/arrow/issues/1217#issuecomment-819838110


   As @wesm is mentioned above, I wrote the following code in python:
   
   fields = [pa.array(['foo', 'foo', 'foo']), pa.array(['bar', 'bar', 'bar'])]
   arr = pa.StructArray.from_arrays(['key', 'value'], fields)
   print(arr)
   
   and I get the following error:
   
   expected bytes, pyarrow.lib.StringArray found
   
   why?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613590007



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] lidavidm commented on pull request #10019: ARROW-12161: [C++][Dataset] Revert async CSV reader in datasets

2021-04-14 Thread GitBox


lidavidm commented on pull request #10019:
URL: https://github.com/apache/arrow/pull/10019#issuecomment-819835139


   Actually, it looks like between your fork's CI and AppVeyor here, all the 
relevant (C++, Python, R, etc.) CI jobs look good to go, with only Travis being 
queued.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ianmcook commented on a change in pull request #10014: ARROW-11477: [R][Doc] Reorganize and improve README and vignette content

2021-04-14 Thread GitBox


ianmcook commented on a change in pull request #10014:
URL: https://github.com/apache/arrow/pull/10014#discussion_r613589255



##
File path: r/vignettes/arrow.Rmd
##
@@ -72,6 +72,45 @@ to other applications and services that use Arrow. One 
example is Spark: the
 move data to and from Spark, yielding [significant performance
 gains](http://arrow.apache.org/blog/2019/01/25/r-spark-improvements/).
 
+# Object hierarchy
+
+## Metadata objects

Review comment:
   Since this has been moved from the README to a less conspicuous place in 
the "Using the Arrow C++ Library in R" vignette, I don't think this is as 
important now. I also believe there are some good reasons to keep it in this 
order, chiefly the fact that it avoids forward references: reading from top to 
bottom, you never come across something that has not already been defined 
above. IMO we should keep this as is for now and plan to refine it for the next 
release.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613588777



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] lidavidm commented on pull request #10019: ARROW-12161: [C++][Dataset] Revert async CSV reader in datasets

2021-04-14 Thread GitBox


lidavidm commented on pull request #10019:
URL: https://github.com/apache/arrow/pull/10019#issuecomment-819834271


   I'll give CI some more time but I'll merge this tonight and then rebase 
ARROW-11797. I've speculatively rebased the latter onto this branch at: 
https://github.com/lidavidm/arrow/tree/arrow-11797 so we can catch anything in 
CI; hopefully we'll have this all merged tonight or early tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613588393



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613588241



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] anthonylouisbsb opened a new pull request #10040: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings - WIP

2021-04-14 Thread GitBox


anthonylouisbsb opened a new pull request #10040:
URL: https://github.com/apache/arrow/pull/10040


   It finishes the implementation that started in the 
https://github.com/apache/arrow/pull/9450 pull request


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613587160



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613586959



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613586609



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613586073



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613585820



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613585163



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613583477



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613583477



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613582678



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613582394



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

[GitHub] [arrow] nealrichardson commented on a change in pull request #9898: ARROW-12017: [R] [Documentation] Make proper developing arrow docs

2021-04-14 Thread GitBox


nealrichardson commented on a change in pull request #9898:
URL: https://github.com/apache/arrow/pull/9898#discussion_r613580971



##
File path: r/vignettes/developing.Rmd
##
@@ -0,0 +1,510 @@
+---
+title: "Arrow R Developer Guide"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Arrow R Developer Guide}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r setup options, include=FALSE}
+knitr::opts_chunk$set(error = TRUE, eval = FALSE)
+
+# Get environment variables describing what to evaluate
+run <- tolower(Sys.getenv("RUN_DEVDOCS", "false")) == "true"
+macos <- tolower(Sys.getenv("DEVDOCS_MACOS", "false")) == "true"
+ubuntu <- tolower(Sys.getenv("DEVDOCS_UBUNTU", "false")) == "true"
+sys_install <- tolower(Sys.getenv("DEVDOCS_SYSTEM_INSTALL", "false")) == "true"
+
+# Update the source knit_hook to save the chunk (if it is marked to be saved)
+knit_hooks_source <- knitr::knit_hooks$get("source")
+knitr::knit_hooks$set(source = function(x, options) {
+  # Extra paranoia about when this will write the chunks to the script, we will
+  # only save when:
+  #   * CI is true
+  #   * RUN_DEVDOCS is true
+  #   * options$save is TRUE (and a check that not NULL won't crash it)
+  if (as.logical(Sys.getenv("CI", FALSE)) && run && !is.null(options$save) && 
options$save)
+cat(x, file = "script.sh", append = TRUE, sep = "\n")
+  # but hide the blocks we want hidden:
+  if (!is.null(options$hide) && options$hide) {
+return(NULL)
+  }
+  knit_hooks_source(x, options)
+})
+```
+
+```{bash, save=run, hide=TRUE}
+# Stop on failure, echo input as we go
+set -e
+set -x
+```
+
+If you're looking to contribute to `arrow`, this document can help you set up 
a development environment that will enable you to write code and run tests 
locally. It outlines how to build the various components that make up the Arrow 
project and R package, as well as some common troubleshooting and workflows 
developers use. Many contributions can be accomplished with the instructions in 
[R-only development](#r-only-development). But if you're working on both the 
C++ library and the R package, the [Developer environment 
setup](#-developer-environment-setup) section will guide you through setting up 
a developer environment.
+
+This document is intended only for developers of Apache Arrow or the Arrow R 
package. Users of the package in R do not need to do any of this setup. If 
you're looking for how to install Arrow, see [the instructions in the 
readme](https://arrow.apache.org/docs/r/#installation); Linux users can find 
more details on building from source at `vignette("install", package = 
"arrow")`.
+
+This document is a work in progress and will grow + change as the Apache Arrow 
project grows and changes. We have tried to make these steps as robust as 
possible (in fact, we even test exactly these instructions on our nightly CI to 
ensure they don't become stale!), but certain custom configurations might 
conflict with these instructions and there are differences of opinion across 
developers about if and what the one true way to set up development 
environments like this is.  We also solicit any feedback you have about things 
that are confusing or additions you would like to see here. Please [report an 
issue](https://issues.apache.org/jira/projects/ARROW/issues) if there you see 
anything that is confusing, odd, or just plain wrong.
+
+## R-only development
+
+Windows and macOS users who wish to contribute to the R package and
+don’t need to alter the Arrow C++ library may be able to obtain a
+recent version of the library without building from source. On macOS,
+you may install the C++ library using [Homebrew](https://brew.sh/):
+
+``` shell
+# For the released version:
+brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
+```
+
+On Windows and Linux, you can download a .zip file with the arrow dependencies 
from the
+nightly repository,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. Version numbers in that
+repository correspond to dates, and you will likely want the most recent.
+
+To see what nightlies are available, you can use Arrow's (or any other S3 
client's) S3 listing functionality to see what is in the bucket 
`s3://arrow-r-nightly/libarrow/bin`:
+
+```
+nightly <- s3_bucket("arrow-r-nightly")
+nightly$ls("libarrow/bin")
+```
+
+## Developer environment setup
+
+If you need to alter both the Arrow C++ library and the R package code, or if 
you can’t get a binary version of the latest C++ library elsewhere, you’ll need 
to build it from source too. This section discusses how to set up a C++ build 
configured to work with the R package. For more general resources, see the 
[Arrow C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp/building.html).
+
+### Install dependencies {.tabset}
+
+The Arrow C++ 

  1   2   3   4   >