[jira] [Created] (ARROW-10953) CLONE - [R] as.data.frame.Table crashes R with schema and no record batches

2020-12-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-10953:
---

 Summary: CLONE - [R] as.data.frame.Table crashes R with schema and 
no record batches
 Key: ARROW-10953
 URL: https://issues.apache.org/jira/browse/ARROW-10953
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 2.0.0
 Environment: > sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so

locale:
 [1] LC_CTYPE=C.UTF-8   LC_NUMERIC=C   LC_TIME=C.UTF-8   
 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8   LC_NAME=C  LC_ADDRESS=C  
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] bigrquery_1.3.2bigrquerystorage_0.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5   cellranger_1.1.0 pillar_1.4.6
 [4] compiler_4.0.3   dbplyr_2.0.0 tools_4.0.3 
 [7] odbc_1.3.0   getPass_0.2-2digest_0.6.27   
[10] bit_4.0.4gargle_0.5.0 jsonlite_1.7.1  
[13] memoise_1.1.0lifecycle_0.2.0  tibble_3.0.4
[16] pkgconfig_2.0.3  rlang_0.4.8  extraw_1.8.25   
[19] DBI_1.1.0rstudioapi_0.13  curl_4.3
[22] xml2_1.3.2   dplyr_1.0.2  httr_1.4.2  
[25] askpass_1.1  fs_1.5.0 generics_0.1.0  
[28] vctrs_0.3.5  hms_0.5.3bit64_4.0.5 
[31] tidyselect_1.1.0 glue_1.4.2   data.table_1.13.2   
[34] R6_2.5.0 readxl_1.3.1 connect.cap_0.3.19  
[37] purrr_0.3.4  blob_1.2.1   magrittr_2.0.1  
[40] ellipsis_0.3.1   assertthat_0.2.1 keyring_1.1.0   
[43] arrow_2.0.0.20201117 openssl_1.4.3crayon_1.3.4 
Reporter: Bruno Tremblay
 Fix For: 3.0.0


Objective is to build a 0 rows data.frame using an arrow schema field definition

 

 

 
{code:java}
#IPC stream containing only a schema
stream<-as.raw(c(255,255,255,255,16,1,0,0,16,0,0,0,0,0,10,0,12,0,6,0,5,0,8,0,10,0,0,0,0,1,3,0,12,0,0,0,8,0,8,0,0,0,4,0,8,0,0,0,4,0,0,0,4,0,0,0,160,0,0,0,92,0,0,0,48,0,0,0,4,0,0,0,128,255,255,255,0,0,1,5,20,0,0,0,12,0,0,0,4,0,0,0,0,0,0,0,176,255,255,255,7,0,0,0,82,69,80,79,78,83,69,0,168,255,255,255,0,0,1,5,20,0,0,0,12,0,0,0,4,0,0,0,0,0,0,0,216,255,255,255,6,0,0,0,68,69,84,65,73,76,0,0,208,255,255,255,0,0,1,5,24,0,0,0,16,0,0,0,4,0,0,0,0,0,0,0,4,0,4,0,4,0,0,0,8,0,0,0,68,65,84,65,84,89,80,69,0,0,0,0,16,0,20,0,8,0,6,0,7,0,12,0,0,0,16,0,16,0,0,0,0,0,1,7,36,0,0,0,20,0,0,0,4,0,0,0,0,0,0,0,8,0,12,0,4,0,8,0,8,0,0,0,38,0,0,0,9,0,0,0,8,0,0,0,77,65,67,84,65,95,73,68,0,0,0,0,0,0,0,0))
readr <- RecordBatchStreamReader$create(stream)
readr$read_table()
# Error in Table__from_RecordBatchStreamReader(self) : 
# Invalid: Must pass at least one record batch or an explicit Schema
# Now trying to be too clever
tb <- Table$create(data.frame(), schema = readr$schema)
dtf <- as.data.frame(tb)
# This will crash you R session
{code}
 

 

Tested on nightly, same behavior. It's borderline a bug / feature request, but 
to be a drop in replacement for some DBI methods, it needs to be able to build 
0 rows data.frame with the correct class for each column.

 

Thank you and have a nice day.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11071) [R][CI] Use processx to set up minio and flight servers in tests

2020-12-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11071:
---

 Summary: [R][CI] Use processx to set up minio and flight servers 
in tests
 Key: ARROW-11071
 URL: https://issues.apache.org/jira/browse/ARROW-11071
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 3.0.0


Rather than rely on them being set up outside of the tests. processx is already 
a transitive test dependency (testthat uses it) so there's no reason for us not 
to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11079) [R] Catch up on changelog since 2.0

2020-12-30 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11079:
---

 Summary: [R] Catch up on changelog since 2.0
 Key: ARROW-11079
 URL: https://issues.apache.org/jira/browse/ARROW-11079
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11080) [C++][Dataset] Improvements to implicit casting

2020-12-30 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11080:
---

 Summary: [C++][Dataset] Improvements to implicit casting
 Key: ARROW-11080
 URL: https://issues.apache.org/jira/browse/ARROW-11080
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 3.0.0


Followup to ARROW-10322. In ARROW-9187, where we started making use of more 
compute functions in R, we found a couple of places where implicit casts 
weren't being inserted where they should:

* 
https://github.com/apache/arrow/pull/8947/commits/843ff2a39d8a4e1c92247fb672567c0b85b4f45a#diff-79100695986bbd6a63704fe9f238ce3ae9a39ddd093b7f6b213d4a722309d20aR576
 "Function multiply_checked has no kernel matching input types (scalar[double], 
array[int32])"
* 
https://github.com/apache/arrow/pull/8947/commits/843ff2a39d8a4e1c92247fb672567c0b85b4f45a#diff-79100695986bbd6a63704fe9f238ce3ae9a39ddd093b7f6b213d4a722309d20aR590
  "Function add_checked has no kernel matching input types (array[double], 
array[int32])" because implicit casts are only applied to scalars to cast them 
to the type of the other argument

This may speak to a need for more rules around how inputs should be 
casted/promoted in different contexts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11092) [CI] (Temporarily) move offending workflows to separate files

2020-12-31 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11092:
---

 Summary: [CI] (Temporarily) move offending workflows to separate 
files
 Key: ARROW-11092
 URL: https://issues.apache.org/jira/browse/ARROW-11092
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 3.0.0


Without warning, INFRA broke several of our GitHub Actions workflows, and have 
been unresponsive all week. See 
https://issues.apache.org/jira/browse/INFRA-21239. Since then, the Rust 
developers have removed their offending actions, so those are no longer 
blocked. This PR does harm reduction for C++ and R workflows, moving the 
workflows that INFRA doesn't like to their own files (temporarily, I hope, 
while this business gets sorted out). This enables the other workflows in each 
file to run, so we at least get some C++ and R tests running, and we can still 
verify on our personal forks the workflows that have been blocked on 
apache/arrow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11136) [R] Bindings for is.nan

2021-01-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11136:
---

 Summary: [R] Bindings for is.nan
 Key: ARROW-11136
 URL: https://issues.apache.org/jira/browse/ARROW-11136
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Jonathan Keane


ARROW-11043 added this compute kernel in C++, so we should wire it up in R



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11152) [CI][C++] Fix Homebrew numpy installation on macOS builds

2021-01-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11152:
---

 Summary: [CI][C++] Fix Homebrew numpy installation on macOS builds
 Key: ARROW-11152
 URL: https://issues.apache.org/jira/browse/ARROW-11152
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Neal Richardson
 Fix For: 3.0.0


Numpy fails to install with homebrew because it tries to upgrade gcc and hits a 
{{brew link}} error. Running {{brew unlink gcc@8 gcc@9}} before {{brew 
install}} could work around this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11153) [C++][Packaging] Move debian/ubuntu/centos packaging off of Travis-CI

2021-01-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11153:
---

 Summary: [C++][Packaging] Move debian/ubuntu/centos packaging off 
of Travis-CI
 Key: ARROW-11153
 URL: https://issues.apache.org/jira/browse/ARROW-11153
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging
Reporter: Neal Richardson
Assignee: Kouhei Sutou
 Fix For: 3.0.0


Per mailing list discussion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11154) [CI][C++] Move homebrew crossbow tests off of Travis-CI

2021-01-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11154:
---

 Summary: [CI][C++] Move homebrew crossbow tests off of Travis-CI
 Key: ARROW-11154
 URL: https://issues.apache.org/jira/browse/ARROW-11154
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging
Reporter: Neal Richardson
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11155) [C++][Packaging] Move gandiva crossbow jobs off of Travis-CI

2021-01-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11155:
---

 Summary: [C++][Packaging] Move gandiva crossbow jobs off of 
Travis-CI
 Key: ARROW-11155
 URL: https://issues.apache.org/jira/browse/ARROW-11155
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging
Reporter: Neal Richardson
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11176) [R] Expose memory pool name and document setting it

2021-01-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11176:
---

 Summary: [R] Expose memory pool name and document setting it
 Key: ARROW-11176
 URL: https://issues.apache.org/jira/browse/ARROW-11176
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Jonathan Keane
 Fix For: 4.0.0


Followup to ARROW-11009, which did this in C++ and added the binding in Python. 
This could be useful not only for debugging but also for benchmarking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11210) [CI] Restore workflows that had been blocked by INFRA

2021-01-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11210:
---

 Summary: [CI] Restore workflows that had been blocked by INFRA
 Key: ARROW-11210
 URL: https://issues.apache.org/jira/browse/ARROW-11210
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 3.0.0


See INFRA-21239, ARROW-11092, ARROW-11132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11217) [C++] Runtime SIMD check on Apple hardware missing

2021-01-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11217:
---

 Summary: [C++] Runtime SIMD check on Apple hardware missing
 Key: ARROW-11217
 URL: https://issues.apache.org/jira/browse/ARROW-11217
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
 Fix For: 3.0.0


[~jeroenooms] hit a crash in the "sum" compute kernel using the R package on a 
new M1 machine running the rosetta emulator: 
https://gist.github.com/jeroen/c60548b29ff7f6807a6554799bd01cb7

According to 
https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment,
 we should be checking sysctlbyname for AVX* capabilities, but we are not. We 
only use that function in 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/cpu_info.cc#L350-L359
 to check cpu cache size. 

This may also explain a crash we observed previously on a very old macOS CRAN 
machine. 

I think we should to resolve this before the 3.0 release if possible, in order 
to avoid bug reports as more people get M1s. 

cc [~apitrou] [~uwe] [~kou] [~frankdu] [~yibo]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11240) [Packaging][R] Add mimalloc to R packaging

2021-01-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11240:
---

 Summary: [Packaging][R] Add mimalloc to R packaging
 Key: ARROW-11240
 URL: https://issues.apache.org/jira/browse/ARROW-11240
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Packaging, R
Reporter: Neal Richardson
 Fix For: 3.0.0


See also ARROW-11231

Relevant scripts:

* ci/scripts/PKGBUILD
* dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb
* r/inst/build_arrow_static.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11247) [C++] Infer date32 columns in CSV

2021-01-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11247:
---

 Summary: [C++] Infer date32 columns in CSV
 Key: ARROW-11247
 URL: https://issues.apache.org/jira/browse/ARROW-11247
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Jared Lander
Assignee: Neal Richardson
 Fix For: 3.0.0


See ARROW-11243 for the original report



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11277) [C++] Fix compilation error in dataset expressions on macOS 10.11

2021-01-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11277:
---

 Summary: [C++] Fix compilation error in dataset expressions on 
macOS 10.11
 Key: ARROW-11277
 URL: https://issues.apache.org/jira/browse/ARROW-11277
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
Assignee: Ben Kietzman


See https://github.com/autobrew/homebrew-core/pull/61#issuecomment-761605455

R binary packages for macOS are built with an old SDK, so this is needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11338) [R] Bindings for quantile and median

2021-01-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11338:
---

 Summary: [R] Bindings for quantile and median 
 Key: ARROW-11338
 URL: https://issues.apache.org/jira/browse/ARROW-11338
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Following ARROW-10831



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11350) [C++] Bump dependency versions

2021-01-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11350:
---

 Summary: [C++] Bump dependency versions
 Key: ARROW-11350
 URL: https://issues.apache.org/jira/browse/ARROW-11350
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11392) [R] Remove ARROW_R_WITH_ARROW flags

2021-01-26 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11392:
---

 Summary: [R] Remove ARROW_R_WITH_ARROW flags
 Key: ARROW-11392
 URL: https://issues.apache.org/jira/browse/ARROW-11392
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


ARROW-10735 did the first part of this. Once we're sure that we want to fully 
remove the wrapping, 

* Remove all references to ARROW_R_WITH_ARROW
* Remove arrow_available() function and all references to it (arrow must always 
be available)
* Update docs to remove mention of the possibility that you could have a 
package installation that doesn't do anything
* Remove all references to TEST_R_WITH_ARROW environment variable and remove 
the r_only() test wrapper



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11423) [R] value_counts and some StructArray methods

2021-01-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11423:
---

 Summary: [R] value_counts and some StructArray methods
 Key: ARROW-11423
 URL: https://issues.apache.org/jira/browse/ARROW-11423
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


Exposing value_counts() is useful for exploration, even if it is limited to 
counting over a single (non-struct) array. And since it returns a StructArray, 
I found it useful to implement some more methods on that object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11424) [C++] Add more StructType and StructArray methods

2021-01-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11424:
---

 Summary: [C++] Add more StructType and StructArray methods
 Key: ARROW-11424
 URL: https://issues.apache.org/jira/browse/ARROW-11424
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


A StructType is basically a Schema (vector of Fields), right? Likewise, a 
StructArray is pretty much the same as a RecordBatch, right? Schema and 
RecordBatch have many more methods than StructType/StructArray, but we should 
be able to do the same kinds of things to structs.

Also, an observation while working on ARROW-11423: the method to extract an 
Array column from a StructArray is called {{field()} and {{GetFieldByName()}}, 
which is confusing since Schema/StructType is what contains Field objects, and 
{{StructArray::field()}} returns Array, not Field.

cc [~bkietz] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11441) [R] Read CSV from character vector

2021-01-30 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11441:
---

 Summary: [R] Read CSV from character vector
 Key: ARROW-11441
 URL: https://issues.apache.org/jira/browse/ARROW-11441
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


`readr::read_csv()` lets you read in data from a character vector, useful for 
(e.g.) taking the results of a system call and reading it in as a data.frame. 

{code}
> readr::read_csv(c("a,b", "1,2", "3,4"))
# A tibble: 2 x 2
  a b
   
1 1 2
2 3 4
{code}

One solution would be similar to ARROW-9235, perhaps, treating it as a 
textConnection. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11460) [R] Use system compression libraries if present on Linux

2021-02-01 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11460:
---

 Summary: [R] Use system compression libraries if present on Linux
 Key: ARROW-11460
 URL: https://issues.apache.org/jira/browse/ARROW-11460
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson


We vendor/bundle all compression libraries and have them disabled in the 
default build. This is reliable, but it would be nice to use system libraries 
if they're present. 

It's not as simple as setting {{ARROW_DEPENDENCY_SOURCE=AUTO}} because we have 
to know if we're using them in order to set the right `-lwhatever` flags in the 
R package build. Maybe these can be determined from the C++ build/cmake output 
rather than detected outside the build (but this may require ARROW-6312).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11474) [C++] Update bundled re2 version

2021-02-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11474:
---

 Summary: [C++] Update bundled re2 version
 Key: ARROW-11474
 URL: https://issues.apache.org/jira/browse/ARROW-11474
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


I tried increasing the re2 version to 2020-11-01 in 

but it failed in a few builds with 

{code}
/usr/bin/ar: 
/root/rpmbuild/BUILD/apache-arrow-3.1.0.dev107/cpp/build/re2_ep-install/lib/libre2.a:
 No such file or directory
make[2]: *** [release/libarrow_bundled_dependencies.a] Error 9
make[1]: *** [src/arrow/CMakeFiles/arrow_bundled_dependencies.dir/all] Error 2
{code}

(or similar). My theory is that something changed in their cmake build setup so 
that either libre2.a is not where we expect it, or it's building a shared 
library instead, or something.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11475) [C++] Upgrade mimalloc

2021-02-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11475:
---

 Summary: [C++] Upgrade mimalloc
 Key: ARROW-11475
 URL: https://issues.apache.org/jira/browse/ARROW-11475
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


I tried this in ARROW-11350 but ran into an issue 
(https://github.com/microsoft/mimalloc/issues/353). That has since been 
resolved and we could apply a patch to bring it in. Or we can wait for it to 
get into a proper release.

There is also now a 1.7 release, which claims to work on the Apple M1, as well 
as a 2.0 version, which claims better performance. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11486) [Website] jekyll build fails with Ruby 3.0

2021-02-03 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11486:
---

 Summary: [Website] jekyll build fails with Ruby 3.0
 Key: ARROW-11486
 URL: https://issues.apache.org/jira/browse/ARROW-11486
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Website
Reporter: Neal Richardson
Assignee: Kouhei Sutou


See https://github.com/apache/arrow-site/runs/1786669028?check_suite_focus=true 
for example. This started failing when the default ruby version increased from 
2.7 to 3.0. Pinning the ruby version to 2.7 fixed it 
(https://github.com/apache/arrow-site/pull/92/commits/b1b8c4fc9138b28ede427967e37da70e12670969);
 maybe that's good enough?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11499) [Packaging] Remove all use of bintray

2021-02-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11499:
---

 Summary: [Packaging] Remove all use of bintray
 Key: ARROW-11499
 URL: https://issues.apache.org/jira/browse/ARROW-11499
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Packaging
Reporter: Neal Richardson
 Fix For: 4.0.0


Bintray is being shut down on May 1, and possibly as early as February 28 we 
won't be able to upload to it. 

https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/

Feel free to make subtasks to break out this work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11501) [C++] endianness check does not work on Solaris

2021-02-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11501:
---

 Summary: [C++] endianness check does not work on Solaris
 Key: ARROW-11501
 URL: https://issues.apache.org/jira/browse/ARROW-11501
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson


{code}
In file included from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/type_traits.h:26:0,
from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/scalar.h:36,
from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/datum.h:28,
from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/dataset/expression.h:32,
from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/dataset/dataset.h:28,
from 
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/dataset/dataset.cc:18:
/export/home/XXVfZhv/Rtemp/RtmpoK9Cps/file3f4a341e5d8f/cpp/src/arrow/util/bit_util.h:26:42:
 
fatal error: endian.h: No such file or directory
{code}

Googling the error message shows some known issues and workarounds for this on 
Solaris, e.g.:

* https://github.com/Sereal/Sereal/issues/139
* https://gitlab.torproject.org/legacy/trac/-/issues/11426

cc [~kiszk]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11500) [R] Allow bundled build script to run on Solaris

2021-02-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11500:
---

 Summary: [R] Allow bundled build script to run on Solaris
 Key: ARROW-11500
 URL: https://issues.apache.org/jira/browse/ARROW-11500
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


Minor changes that allow us to at least attempt a build on Solaris. Does not 
resolve C++ build issues



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11507) [R] Bindings for GetRuntimeInfo

2021-02-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11507:
---

 Summary: [R] Bindings for GetRuntimeInfo
 Key: ARROW-11507
 URL: https://issues.apache.org/jira/browse/ARROW-11507
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11513) [R] Bindings for sub/gsub

2021-02-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11513:
---

 Summary: [R] Bindings for sub/gsub
 Key: ARROW-11513
 URL: https://issues.apache.org/jira/browse/ARROW-11513
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11514) [R] Bindings for str_c

2021-02-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11514:
---

 Summary: [R] Bindings for str_c
 Key: ARROW-11514
 URL: https://issues.apache.org/jira/browse/ARROW-11514
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11515) [R] Bindings for strsplit

2021-02-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11515:
---

 Summary: [R] Bindings for strsplit
 Key: ARROW-11515
 URL: https://issues.apache.org/jira/browse/ARROW-11515
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


split_pattern is the C++ compute function name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11516) [R] Allow all C++ compute functions to be called by name in dplyr

2021-02-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11516:
---

 Summary: [R] Allow all C++ compute functions to be called by name 
in dplyr
 Key: ARROW-11516
 URL: https://issues.apache.org/jira/browse/ARROW-11516
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


Followup to ARROW-9856. Use list_compute_functions (added here) to make all 
Arrow C++ compute functions available directly by name (in case you want to use 
the non-checked arithmetic, or an ascii specific kernel, or something without a 
natural R analogue). Will require a bit more refactoring to handle variable 
numbers of args, as well as some additional options handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11589) [R] Add methods for modifying Schemas

2021-02-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11589:
---

 Summary: [R] Add methods for modifying Schemas
 Key: ARROW-11589
 URL: https://issues.apache.org/jira/browse/ARROW-11589
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Carl Boettiger
 Fix For: 4.0.0


$<-, [[<-, and (probably) [<- methods. We have the extracting versions 
implemented but not the updating ones, and that would be useful.

Motivating use case: schema detection for a dataset misreads a column, so take 
the autodetected schema, modify one field, and then re-create the dataset with 
the correct schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11591) [C++] Prototype version of hash aggregation

2021-02-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11591:
---

 Summary: [C++] Prototype version of hash aggregation
 Key: ARROW-11591
 URL: https://issues.apache.org/jira/browse/ARROW-11591
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11610) [C++] Download boost from sourceforge instead of bintray

2021-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11610:
---

 Summary: [C++] Download boost from sourceforge instead of bintray
 Key: ARROW-11610
 URL: https://issues.apache.org/jira/browse/ARROW-11610
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


e.g. 
https://sourceforge.net/projects/boost/files/boost/1.67.0/boost_1_67_0.tar.gz



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11611) [C++] Move third party dependency mirrors from bintray

2021-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11611:
---

 Summary: [C++] Move third party dependency mirrors from bintray
 Key: ARROW-11611
 URL: https://issues.apache.org/jira/browse/ARROW-11611
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


We added copies of these a while back to handle rate limiting to our own 
bintray. We should either remove them or update and move them elsewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11612) [C++] Rebuild trimmed boost bundle

2021-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11612:
---

 Summary: [C++] Rebuild trimmed boost bundle
 Key: ARROW-11612
 URL: https://issues.apache.org/jira/browse/ARROW-11612
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


And host somewhere other than bintray. We can prune it further now that we've 
dropped boost::regex, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11613) [R] Move nightly C++ builds off of bintray

2021-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11613:
---

 Summary: [R] Move nightly C++ builds off of bintray
 Key: ARROW-11613
 URL: https://issues.apache.org/jira/browse/ARROW-11613
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11657) [R] group_by with .drop specified errors

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11657:
---

 Summary: [R] group_by with .drop specified errors
 Key: ARROW-11657
 URL: https://issues.apache.org/jira/browse/ARROW-11657
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


cf. https://github.com/tidyverse/dplyr/issues/5763



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11658) [R] Handle mutate/rename inside group_by

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11658:
---

 Summary: [R] Handle mutate/rename inside group_by
 Key: ARROW-11658
 URL: https://issues.apache.org/jira/browse/ARROW-11658
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Followup to ARROW-11657



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11659) [R] Preserve group_by .drop argument

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11659:
---

 Summary: [R] Preserve group_by .drop argument
 Key: ARROW-11659
 URL: https://issues.apache.org/jira/browse/ARROW-11659
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11660) [C++] Move RecordBatch::SelectColumns method from R to C++ library

2021-02-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11660:
---

 Summary: [C++] Move RecordBatch::SelectColumns method from R to 
C++ library
 Key: ARROW-11660
 URL: https://issues.apache.org/jira/browse/ARROW-11660
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson
 Fix For: 4.0.0


Table has a proper SelectColumns method in the C++ library but the RecordBatch 
one is in the R library and should be pushed down to C++



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11672) [R] Fix string function test failure on R 3.3

2021-02-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11672:
---

 Summary: [R] Fix string function test failure on R 3.3
 Key: ARROW-11672
 URL: https://issues.apache.org/jira/browse/ARROW-11672
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


https://github.com/ursacomputing/crossbow/runs/1916519092#step:7:389

This test was added in ARROW-9856



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11683) [R] Support dplyr::mutate()

2021-02-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11683:
---

 Summary: [R] Support dplyr::mutate()
 Key: ARROW-11683
 URL: https://issues.apache.org/jira/browse/ARROW-11683
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11693) [C++] Add string length kernel

2021-02-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11693:
---

 Summary: [C++] Add string length kernel
 Key: ARROW-11693
 URL: https://issues.apache.org/jira/browse/ARROW-11693
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


We have "binary_length" but that doesn't handle UTF-8 the way we need for this. 
Example (from R):

{code}
> string <- "áéíóú"
> nchar(string)
[1] 5
> arrow:::call_function("binary_length", Scalar$create(string))
Scalar
10
{code}

cc [~maartenbreddels] [~apitrou] [~jorisvandenbossche]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11699) [R] Implement dplyr::across()

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11699:
---

 Summary: [R] Implement dplyr::across()
 Key: ARROW-11699
 URL: https://issues.apache.org/jira/browse/ARROW-11699
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson


It's not a generic, but because it seems only to be called inside of functions 
like `mutate()`, we can insert our own version of it into the NSE data mask



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11700) [R] Internationalize error handling in tidy eval

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11700:
---

 Summary: [R] Internationalize error handling in tidy eval
 Key: ARROW-11700
 URL: https://issues.apache.org/jira/browse/ARROW-11700
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


We have 

{code}
  tryCatch(eval_tidy(expr, mask), error = function(e) {
# Look for the cases where bad input was given, i.e. this would fail
# in regular dplyr anyway, and let those raise those as errors;
# else, for things not supported by Arrow return a "try-error",
# which we'll handle differently
msg <- conditionMessage(e)
# TODO: internationalization?
if (grepl("object '.*'.not.found", msg)) {
  stop(e)
}
if (grepl('could not find function ".*"', msg)) {
  stop(e)
}
invisible(structure(msg, class = "try-error", condition = e))
  })
{code}

and tests for this behavior, but the tests are skipped because they only match 
correctly in an English locale because these base R messages are translated.

We can generate these regular expressions dynamically by triggering the R 
errors on a known nonexistent object:

{code}
> tryCatch(X_X, error = function(e) conditionMessage(e))
[1] "object 'X_X' not found"
> tryCatch(X_X(), error = function(e) conditionMessage(e))
[1] "could not find function \"X_X\""
> sub("X_X", ".*", tryCatch(X_X, error = function(e) 
> conditionMessage(e)))
[1] "object '.*' not found"
{code}

And this will respect i18n:

{code}
> Sys.setenv(LANGUAGE="FR_fr")
> sub("X_X", ".*", tryCatch(X_X, error = function(e) 
> conditionMessage(e)))
[1] "objet '.*' introuvable"
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11701) [R] Implement dplyr::relocate()

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11701:
---

 Summary: [R] Implement dplyr::relocate()
 Key: ARROW-11701
 URL: https://issues.apache.org/jira/browse/ARROW-11701
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Is a generic so we can support it properly. Allows for column reordering, 
callable directly or with the .before/.after args to mutate(). This is 
something we can implement with the current C++ backend support.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11702) [R] Enable ungrouped aggregations in non-Dataset expressions

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11702:
---

 Summary: [R] Enable ungrouped aggregations in non-Dataset 
expressions
 Key: ARROW-11702
 URL: https://issues.apache.org/jira/browse/ARROW-11702
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson


Things like {{mutate(table, x_norm = x / mean(x, na.rm = TRUE))}} could be 
supported for queries on Table/RecordBatch (but not yet on Dataset), but even 
so there are lots of gotchas, such as order of evaluation when building up a 
lazy query (i.e. evaluating aggregation before or after a filter expression 
that may change the value of the aggregation result).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11703) [R] Implement dplyr::arrange()

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11703:
---

 Summary: [R] Implement dplyr::arrange()
 Key: ARROW-11703
 URL: https://issues.apache.org/jira/browse/ARROW-11703
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Only for Table/RecordBatch for now. There are sorting functions in the compute 
module now 
(https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions) and I 
think they have Python bindings already.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11704) [R] Wire up dplyr::mutate() for datasets

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11704:
---

 Summary: [R] Wire up dplyr::mutate() for datasets
 Key: ARROW-11704
 URL: https://issues.apache.org/jira/browse/ARROW-11704
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11705) [R] Support scalar value recycling in RecordBatch/Table$create()

2021-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11705:
---

 Summary: [R] Support scalar value recycling in 
RecordBatch/Table$create()
 Key: ARROW-11705
 URL: https://issues.apache.org/jira/browse/ARROW-11705
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Compare:

{code}
> tibble::tibble(a=1:5, b = 42)
# A tibble: 5 x 2
  a b
   
1 142
2 242
3 342
4 442
5 542
> arrow::record_batch(a=1:5, b = 42)
Error: Invalid: All arrays must have the same length
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11734) [C++] vendored safe-math.h does not compile on Solaris

2021-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11734:
---

 Summary: [C++] vendored safe-math.h does not compile on Solaris
 Key: ARROW-11734
 URL: https://issues.apache.org/jira/browse/ARROW-11734
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11735) [R] Allow parquet to be an optional component like S3

2021-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11735:
---

 Summary: [R] Allow parquet to be an optional component like S3
 Key: ARROW-11735
 URL: https://issues.apache.org/jira/browse/ARROW-11735
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Parquet requires thrift and it seems that thrift (at least as of version 0.12) 
does not compile on Solaris. We could debug that, or we could also make Parquet 
an optional feature in the R bindings. That might have some value anyway so 
that one could build a lighter/minimal R package, if that were helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11736) [R] Allow string compute functions to be optional

2021-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11736:
---

 Summary: [R] Allow string compute functions to be optional
 Key: ARROW-11736
 URL: https://issues.apache.org/jira/browse/ARROW-11736
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


The Solaris build fails to build {{libarrow_bundled_dependencies.a}} because of 
some mismatch of arguments to the {{ar}} command: 

{code}
[ 19%] Bundling 
/export/home/XnknpBn/Rtemp/RtmpBOhxfH/file66df7a592ae4/release/libarrow_bundled_dependencies.a
gmake[2]: Entering directory 
'/export/home/XnknpBn/Rtemp/RtmpBOhxfH/file66df7a592ae4'
usage: ar -d[-SvV] archive file ...
   ar -m[-abiSvV] [posname] archive file ...
   ar -p[-vV][-sS] archive [file ...]
   ar -q[-cuvSV] [-abi] [posname] [file ...]
   ar -r[-cuvSV] [-abi] [posname] [file ...]
   ar -t[-vV][-sS] archive [file ...]
   ar -x[-vV][-sSCT] archive [file ...]
gmake[2]: *** 
[src/arrow/CMakeFiles/arrow_bundled_dependencies.dir/build.make:61: 
release/libarrow_bundled_dependencies.a] Error 1
{code}

If ARROW_PARQUET=OFF (ARROW-11735), the only dependencies to bundle are re2 and 
utf8proc. So we could either fix the {{ar}} invocation, or we could make re2 
and utf8proc optional. Build-wise, they are optional, but we have some tests 
that call the string kernels, and we'd need to know that they should be skipped 
(i.e. another option in {{skip_if_not_available()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11737) [C++] Patch vendored xxhash for Solaris

2021-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11737:
---

 Summary: [C++] Patch vendored xxhash for Solaris 
 Key: ARROW-11737
 URL: https://issues.apache.org/jira/browse/ARROW-11737
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


It fails to compile, but interestingly just as I was looking into the error, I 
see that the issue has been fixed _today_ in xxhash: 
https://github.com/Cyan4973/xxHash/pull/498

So I think we just need to apply this patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11740) [C++] posix_memalign not declared in scope on Solaris

2021-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11740:
---

 Summary: [C++] posix_memalign not declared in scope on Solaris
 Key: ARROW-11740
 URL: https://issues.apache.org/jira/browse/ARROW-11740
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson


{code}
[ 27%] Building CXX object 
src/arrow/CMakeFiles/arrow_objlib.dir/memory_pool.cc.o
/export/home/X4HzInm/Rtemp/Rtmp1Zx7Xc/file1f6372fd66ce/cpp/src/arrow/memory_pool.cc:In
 static member function static arrow::Status 
arrow::{anonymous}::SystemAllocator::AllocateAligned(int64_t, uint8_t**):
/export/home/X4HzInm/Rtemp/Rtmp1Zx7Xc/file1f6372fd66ce/cpp/src/arrow/memory_pool.cc:187:64:error:
 posix_memalignwas not declared in this scope
   static_cast(size));
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11752) [R] Replace usage of testthat::expect_is()

2021-02-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11752:
---

 Summary: [R] Replace usage of testthat::expect_is()
 Key: ARROW-11752
 URL: https://issues.apache.org/jira/browse/ARROW-11752
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Per https://testthat.r-lib.org/reference/expect_is.html it has been superceded. 
We have ~180 instances of it in our tests that should be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11754) [R] Support dplyr::compute()

2021-02-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11754:
---

 Summary: [R] Support dplyr::compute()
 Key: ARROW-11754
 URL: https://issues.apache.org/jira/browse/ARROW-11754
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson


See discussion at 
https://github.com/apache/arrow/pull/9521#discussion_r581367505



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11755) [R] Add tests from dplyr/test-mutate.r

2021-02-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11755:
---

 Summary: [R] Add tests from dplyr/test-mutate.r
 Key: ARROW-11755
 URL: https://issues.apache.org/jira/browse/ARROW-11755
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Review 
https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r and 
port tests over to arrow as needed to see if there are edge cases we aren't 
covering appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11785) [R] Fallback when filtering Table with if_any() expression fails

2021-02-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11785:
---

 Summary: [R] Fallback when filtering Table with if_any() 
expression fails
 Key: ARROW-11785
 URL: https://issues.apache.org/jira/browse/ARROW-11785
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


{code}
> iris %>% record_batch() %>%
+filter(if_any(ends_with("Width"), ~ . > 4))
Warning: Filter expression not implemented in Arrow: if_any(ends_with("Width"), 
~. > 4); pulling data into R
Error: Cannot extract rows with an object of class NULL
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11832) [R] Handle conversion of extra nested struct column

2021-03-01 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11832:
---

 Summary: [R] Handle conversion of extra nested struct column
 Key: ARROW-11832
 URL: https://issues.apache.org/jira/browse/ARROW-11832
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Romain Francois
 Fix For: 4.0.0


Followup to ARROW-10570. See 
https://github.com/apache/arrow/pull/8650/#issuecomment-788404473



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11864) [R] Document arrow.int64_downcast option

2021-03-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11864:
---

 Summary: [R] Document arrow.int64_downcast option
 Key: ARROW-11864
 URL: https://issues.apache.org/jira/browse/ARROW-11864
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Matthew Summersgill
 Fix For: 4.0.0


See ARROW-9083 and discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11878) [C++] Improve Converter API to support chunking

2021-03-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11878:
---

 Summary: [C++] Improve Converter API to support chunking
 Key: ARROW-11878
 URL: https://issues.apache.org/jira/browse/ARROW-11878
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


We would like to be able to chunk a data frame when converting to Arrow Table 
in R (see ARROW-9293). Apparently this is also not supported in pyarrow. 

[~romainfrancois] says two things need to happen: 

 - Converter api needs to be able to Extend() a range of values, as opposed to 
the current api we have : {{Status Extend(SEXP x, int64_t size)}} override 
which says ingest that vector x and btw it has this many elements. 

 - Chunker or perhaps another/new class would sit on top of that and perhaps 
{{Chunker::Extend(x)}} would call multiple times (one for each chunk) 
{{Converter$Extend(x, start, size)}}. 

The current chunker solves I believe a different problem and is rooted in a 
Converter that deals with elements one by one so that: 
  - if the element can be Append() that’s fine
  - if not, then create a new chunk and try again

The current chunker has a multiple element method but it’s an all or nothing: 

{code}
  // we could get bit smarter here since the whole batch of appendable values
  // will be rejected if a capacity error is raised
  Status Extend(InputType values, int64_t size) {
auto status = converter_->Extend(values, size);
if (ARROW_PREDICT_FALSE(status.IsCapacityError())) {
  if (converter_->builder()->length() == 0) {
return status;
  }
  ARROW_RETURN_NOT_OK(FinishChunk());
  return Extend(values, size);
}
length_ += size;
return status;
  }
{code}

This does not give a way to say e.g. take this vector and chunk it into arrays 
of this size, which is what we want. 

cc [~kszucs] [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11912) [R] Remove args from FeatherReader$create

2021-03-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11912:
---

 Summary: [R] Remove args from FeatherReader$create
 Key: ARROW-11912
 URL: https://issues.apache.org/jira/browse/ARROW-11912
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


They aren't used anymore because FeatherReader$create() now requires that you 
provide it a file connection. (We leaked connections before when it accepted a 
string file path and opened a connection if needed.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11921) [R] Set LC_COLLATE in r/data-raw/codegen.R

2021-03-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11921:
---

 Summary: [R] Set LC_COLLATE in r/data-raw/codegen.R
 Key: ARROW-11921
 URL: https://issues.apache.org/jira/browse/ARROW-11921
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


So that the sort order of the generated wrapping code is stable across 
machines. Otherwise we'll keep thrashing on arrowExports.cpp whenever different 
people rebuild things (cf. 
https://github.com/apache/arrow/commit/21999ecd3cf2b9141e182c648eb13ab3836500d0#diff-f6ded32632f8b1516f0e852b8e648af02be39e60010c546a17502d1830245076).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11950) [C++] Add unary negative kernel

2021-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11950:
---

 Summary: [C++] Add unary negative kernel
 Key: ARROW-11950
 URL: https://issues.apache.org/jira/browse/ARROW-11950
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
Assignee: Eduardo Ponce
 Fix For: 4.0.0


Related to ARROW-11945. So that you can have an expression like {{-col}}. You 
can approximate this with doing {{0 - col}}, but I would guess it could be done 
more efficiently.

cc [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11954) [C++] arrow/util/io_util.cc does not compile on Solaris

2021-03-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11954:
---

 Summary: [C++] arrow/util/io_util.cc does not compile on Solaris
 Key: ARROW-11954
 URL: https://issues.apache.org/jira/browse/ARROW-11954
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Neal Richardson


Looks similar to ARROW-11740

{code}
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:
 In function ‘arrow::Status arrow::internal::MemoryMapRemap(void*, std::size_t, 
std::size_t, int, void**)’:
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:1089:48:
 error: ‘MREMAP_MAYMOVE’ was not declared in this scope
*new_addr = mremap(addr, old_size, new_size, MREMAP_MAYMOVE);
 ^
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:1089:62:
 error: ‘mremap’ was not declared in this scope
*new_addr = mremap(addr, old_size, new_size, MREMAP_MAYMOVE);
 ^
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:
 In function ‘arrow::Status arrow::internal::MemoryAdviseWillNeed(const 
std::vector&)’:
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:1144:59:
 error: ‘POSIX_MADV_WILLNEED’ was not declared in this scope
int err = posix_madvise(aligned.addr, aligned.size, POSIX_MADV_WILLNEED);
 ^
/export/home/XI4sjNd/Rtemp/RtmpvN4Lx2/fileef105d2909/cpp/src/arrow/util/io_util.cc:1144:78:
 error: ‘posix_madvise’ was not declared in this scope
int err = posix_madvise(aligned.addr, aligned.size, POSIX_MADV_WILLNEED);
 ^
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11993) [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE

2021-03-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11993:
---

 Summary: [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
 Key: ARROW-11993
 URL: https://issues.apache.org/jira/browse/ARROW-11993
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson


It doesn't get used if SIMD level is NONE, so we shouldn't bother downloading 
it.

cc [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11994) [R] Build fails if dataset enabled but parquet is not

2021-03-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11994:
---

 Summary: [R] Build fails if dataset enabled but parquet is not
 Key: ARROW-11994
 URL: https://issues.apache.org/jira/browse/ARROW-11994
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson


Following ARROW-11735; discovered while working on ARROW-10734. The 
arrow::dataset::ParquetFileFormat and related classes require both dataset and 
parquet. The {{#if defined}} logic in r/src/dataset.cpp is right and both are 
required, but in the wrapping that is generated for arrowExports.cpp, we only 
use the annotation on the functions, {{[[dataset::export]]}} to wrap. So the 
ParquetFileFormat methods in arrowExports.cpp are if defined 
ARROW_R_WITH_DATASET and fail if parquet is not available.

Not a priority to fix (for Solaris I can turn off ARROW_DATASET and avoid 
this), just wanted to note it in case we need to revisit this wrapping logic 
later anyway. cc [~icook]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11996) [R] Make r/configure run successfully on Solaris

2021-03-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-11996:
---

 Summary: [R] Make r/configure run successfully on Solaris
 Key: ARROW-11996
 URL: https://issues.apache.org/jira/browse/ARROW-11996
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson


Replace some {{$()}} with backticks and use {{sed}} in a safe way



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12081) [R] Bindings for utf8_length

2021-03-24 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12081:
---

 Summary: [R] Bindings for utf8_length
 Key: ARROW-12081
 URL: https://issues.apache.org/jira/browse/ARROW-12081
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


Following ARROW-11693



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12085) [R] Installation on ppc64le

2021-03-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12085:
---

 Summary: [R] Installation on ppc64le
 Key: ARROW-12085
 URL: https://issues.apache.org/jira/browse/ARROW-12085
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson


>From https://github.com/apache/arrow/issues/9747



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12094) [C++][R] Fix/workaround re2 building on clang/libc++

2021-03-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12094:
---

 Summary: [C++][R] Fix/workaround re2 building on clang/libc++
 Key: ARROW-12094
 URL: https://issues.apache.org/jira/browse/ARROW-12094
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson
 Fix For: 4.0.0


See https://github.com/apache/arrow/pull/8468#issuecomment-807807284. We either 
need to fix the build (maybe there's something not getting passed through to 
build_re2 correctly in cmake) or figure out the conditions under which the C++ 
build should turn off re2. 

See also ARROW-11736 to make regex compute functions optional in R tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12095) [CI][C++] Add nightly job to test offline build

2021-03-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12095:
---

 Summary: [CI][C++] Add nightly job to test offline build
 Key: ARROW-12095
 URL: https://issues.apache.org/jira/browse/ARROW-12095
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Continuous Integration
Reporter: Neal Richardson
 Fix For: 5.0.0


See discussion on https://github.com/apache/arrow/pull/9803



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12128) [CI][Crossbow] Remove (or fix) test-ubuntu-16.04-cpp job

2021-03-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12128:
---

 Summary: [CI][Crossbow] Remove (or fix) test-ubuntu-16.04-cpp job
 Key: ARROW-12128
 URL: https://issues.apache.org/jira/browse/ARROW-12128
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Continuous Integration
Reporter: Neal Richardson
 Fix For: 4.0.0


ARROW-8049 increased the minimum cmake version required for bundled thrift to 
3.10, which is not what 16.04 ships. We removed packaging jobs in ARROW-11910 
because it is EOL in April 2021, but we still have a nightly job that is 
failing and other related materials (Dockerfile etc.) for 16.04.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12134) [C++] Add regex string match kernel

2021-03-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12134:
---

 Summary: [C++] Add regex string match kernel
 Key: ARROW-12134
 URL: https://issues.apache.org/jira/browse/ARROW-12134
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


We have a basic {{match_substring}} kernel already but not a regular expression 
one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12137) [R] New/improved vignette on dplyr features

2021-03-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12137:
---

 Summary: [R] New/improved vignette on dplyr features
 Key: ARROW-12137
 URL: https://issues.apache.org/jira/browse/ARROW-12137
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Ian Cook
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12141) [R] Bindings for grepl

2021-03-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12141:
---

 Summary: [R] Bindings for grepl
 Key: ARROW-12141
 URL: https://issues.apache.org/jira/browse/ARROW-12141
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Depends on ARROW-12134. There's {{match_substring_regex}} and 
{{match_substring}} for the {{fixed = TRUE}} version. Also map to 
{{stringr::str_detect}} as appropriate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12198) [R] bindings for strptime

2021-04-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12198:
---

 Summary: [R] bindings for strptime
 Key: ARROW-12198
 URL: https://issues.apache.org/jira/browse/ARROW-12198
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12199) [R] bindings for stddev, variance

2021-04-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12199:
---

 Summary: [R] bindings for stddev, variance
 Key: ARROW-12199
 URL: https://issues.apache.org/jira/browse/ARROW-12199
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12197) [R] dplyr bindings for cast, dictionary_encode

2021-04-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12197:
---

 Summary: [R] dplyr bindings for cast, dictionary_encode
 Key: ARROW-12197
 URL: https://issues.apache.org/jira/browse/ARROW-12197
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12200) [R] Export and document list_compute_functions

2021-04-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12200:
---

 Summary: [R] Export and document list_compute_functions
 Key: ARROW-12200
 URL: https://issues.apache.org/jira/browse/ARROW-12200
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 4.0.0


Since they're available to call in dplyr now, we should make it available. Note 
that not all compute functions are suitable to work in filter/mutate, and some 
will require custom C++ wiring for the FunctionOptions. But many/most just work 
now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12212) [R][CI] Test nightly on solaris

2021-04-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12212:
---

 Summary: [R][CI] Test nightly on solaris
 Key: ARROW-12212
 URL: https://issues.apache.org/jira/browse/ARROW-12212
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration, R
Reporter: Neal Richardson


Followup to ARROW-10734. Setting up a solaris vm on github actions may be 
possible. We can try to setup https://github.com/vmactions/solaris-vm with R 
from https://files.r-hub.io/opencsw/. A temporary solution could be a nightly 
r-hub build kicked off by the arrow-r-nightly CI; it would email me with the 
results. Not ideal but it would at least alert us to issues closer to when they 
are merged and not just at release time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12213) [R] copy_files doesn't make it easy to copy a single file

2021-04-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12213:
---

 Summary: [R] copy_files doesn't make it easy to copy a single file
 Key: ARROW-12213
 URL: https://issues.apache.org/jira/browse/ARROW-12213
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson


copy_files (i.e. fs::CopyFiles) makes it trivial to recursively copy a 
directory/bucket to or from S3, but I'm having a hard time downloading a single 
file.

cc [~bkietz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12236) [R][CI] Add check that all docs pages are listed in _pkgdown.yml

2021-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12236:
---

 Summary: [R][CI] Add check that all docs pages are listed in 
_pkgdown.yml
 Key: ARROW-12236
 URL: https://issues.apache.org/jira/browse/ARROW-12236
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration, R
Reporter: Neal Richardson


Our (external) nightly R packaging and docs build is failing to render the 
pkgdown site: 
https://github.com/ursa-labs/arrow-r-nightly/runs/2266551062?check_suite_focus=true#step:9:55

This is due to (1) a [new-ish change in 
pkgdown|https://github.com/r-lib/pkgdown/pull/1395] that errors if topics are 
not included and (2) the recent addition of FragmentScanOptions, which did not 
get added to _pkgdown.yml.

We should validate this on our regular CI in order to prevent future issues 
like this. We often have to add things to _pkgdown.yml right at release time, 
and it would be better to keep up as we go. Some ideas for how:

* Add a step to an existing R workflow (e.g. 
https://github.com/apache/arrow/blob/master/.github/workflows/r.yml#L60) that 
does this check
* Add a new workflow that is triggered only on changes to `r/man` and 
`r/_pkgdown.yml`
* In either case, this could be done as a bash script, a python script, or an R 
script. If using R, note that the docker-based CI jobs won't have R installed, 
so you might want to tack it onto one of the windows jobs (which uses the 
setup-r action), but then you're in windows. 
* You could install pkgdown and try to build the site, but that's a lot of 
dependency to download and install just to essentially compare some lines in a 
yaml file with a directory listing (i.e., make sure that all {{r/man/*.Rd}} 
have corresponding entries in the reference part of the yml), so python or even 
a bash script might be more efficient to run. And since this is going to run a 
lot, it's worth considering how to keep runtime down even if that means more 
work to set it up.
* If you're scripting this standalone, think you'll need to filter out Rd files 
that have {{\keyword{internal}}} as pkgdown excludes those from the reference 
list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12304) [R] Update news and polish docs for 4.0

2021-04-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12304:
---

 Summary: [R] Update news and polish docs for 4.0
 Key: ARROW-12304
 URL: https://issues.apache.org/jira/browse/ARROW-12304
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12316) [C++] Switch default memory allocator from jemalloc to mimalloc

2021-04-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12316:
---

 Summary: [C++] Switch default memory allocator from jemalloc to 
mimalloc
 Key: ARROW-12316
 URL: https://issues.apache.org/jira/browse/ARROW-12316
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Neal Richardson
 Fix For: 4.0.0


Benchmarking shows that mimalloc seems to be faster on real workflows (at least 
on macOS, still collecting data on Ubuntu). We could switch the default memory 
pool cases so that mimalloc is preferred. 

cc [~jonkeane] [~apitrou]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12356) [Website] Update install page instructions to point to artifactory

2021-04-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12356:
---

 Summary: [Website] Update install page instructions to point to 
artifactory
 Key: ARROW-12356
 URL: https://issues.apache.org/jira/browse/ARROW-12356
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Website
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 4.0.0


Looks like packages for old versions have been moved over, even if we can't 
upload new ones yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12370) [R] Bindings for power kernel

2021-04-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12370:
---

 Summary: [R] Bindings for power kernel
 Key: ARROW-12370
 URL: https://issues.apache.org/jira/browse/ARROW-12370
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 5.0.0


C++ implemented in ARROW-11070. There is a TODO in expression.R that references 
this issue. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12571) [R][CI] Run nightly R with valgrind

2021-04-27 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12571:
---

 Summary: [R][CI] Run nightly R with valgrind
 Key: ARROW-12571
 URL: https://issues.apache.org/jira/browse/ARROW-12571
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration, R
Reporter: Neal Richardson
 Fix For: 5.0.0


The wch/r-debug container that we run the ASAN/UBSAN sanitizer job also has a 
valgrind version of R: 
https://github.com/wch/r-debug#docker-image-for-debugging-r-memory-problems

According to https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt, we 
possibly also should run R CMD check with --use-valgrind.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12575) [R] Use unary negative kernel

2021-04-27 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12575:
---

 Summary: [R] Use unary negative kernel
 Key: ARROW-12575
 URL: https://issues.apache.org/jira/browse/ARROW-12575
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 5.0.0


Followup to ARROW-11950. Grep for that issue number in the r directory to see 
where to make changes. 
https://github.com/apache/arrow/pull/10113/files#diff-ce5b94577014735990903d3d03bd4ea4b8c8e6d32f5227592e60b7dd6a912d59
 shows what the new compute function is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12620) [C++] Dataset writing can only include projected columns if input columns are also included

2021-04-30 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12620:
---

 Summary: [C++] Dataset writing can only include projected columns 
if input columns are also included
 Key: ARROW-12620
 URL: https://issues.apache.org/jira/browse/ARROW-12620
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 4.0.0
Reporter: Neal Richardson


I discovered this while working on https://github.com/apache/arrow/pull/10191. 
You can project new columns when writing a dataset, but only if they are 
derived from columns that are included in the output. Here's an R-based example:

{code}
# Simple function to write and re-open the new dataset
write_then_open <- function(ds, path, ...) {
  write_dataset(ds, path, ...)
  open_dataset(path)
}

tab <- Table$create(a = 1:5)

tab %>% 
  write_then_open(ds_dir) %>%
  collect()

# # A tibble: 5 x 1
#   a
#   
# 1 1
# 2 2
# 3 3
# 4 4
# 5 5

# If you rename a column, it's all nulls
tab %>%
  select(b = a) %>%
  write_then_open(ds_dir) %>%
  collect()

# # A tibble: 5 x 1
#   b
#   
# 1NA
# 2NA
# 3NA
# 4NA
# 5NA

# If you derive a new column and keep the original, it works
tab %>%
  mutate(b = a) %>%
  write_then_open(ds_dir) %>%
  collect()

# # A tibble: 5 x 2
#   a b
#
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5

# transmute() only keeps the added columns, so it also illustrates the failure
tab %>%
  transmute(b = a) %>%
  write_then_open(ds_dir) %>%
  collect()

# # A tibble: 5 x 1
#   b
#   
# 1NA
# 2NA
# 3NA
# 4NA
# 5NA
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12633) [C++] Query engine v0 umbrella issue

2021-05-03 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12633:
---

 Summary: [C++] Query engine v0 umbrella issue
 Key: ARROW-12633
 URL: https://issues.apache.org/jira/browse/ARROW-12633
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python, R
Reporter: Neal Richardson
 Fix For: 5.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12689) [R] Implement ArrowArrayStream C interface

2021-05-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12689:
---

 Summary: [R] Implement ArrowArrayStream C interface
 Key: ARROW-12689
 URL: https://issues.apache.org/jira/browse/ARROW-12689
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
 Fix For: 5.0.0


See 
https://github.com/apache/arrow/commit/97879eb970bac52d93d2247200b9ca7acf6f3f93,
 which adds it and also adds Python bindings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12688) [R] Use DuckDB to query an Arrow Dataset

2021-05-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12688:
---

 Summary: [R] Use DuckDB to query an Arrow Dataset
 Key: ARROW-12688
 URL: https://issues.apache.org/jira/browse/ARROW-12688
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson


DuckDB can read data from an Arrow C-interface stream. Once we can provide that 
struct from R, presumably DuckDB could query on that stream. 

A first step is just connecting the pieces. A second step would be to handle 
parts of the DuckDB query and push down filtering/projection to Arrow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12694) [R] rtools35 job failing on 32-bit build tests

2021-05-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12694:
---

 Summary: [R] rtools35 job failing on 32-bit build tests
 Key: ARROW-12694
 URL: https://issues.apache.org/jira/browse/ARROW-12694
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, R
Reporter: Neal Richardson


See 
https://github.com/apache/arrow/actions/workflows/r.yml?query=branch%3Amaster, 
this started when ARROW-9697 (CountRows for Scanner) merged. It's only failing 
on rtools35 (aka gcc 4.9), and only on the 32-bit build (i386). Since there's 
no output about what failed, it's probably a segfault. The easiest way to get 
more information is to flip this {{if: false}} to true and let it print 
detailed output about where it was when it died 
https://github.com/apache/arrow/blob/master/.github/workflows/r.yml#L186



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12731) [R] Use InMemoryDataset for Table/RecordBatch in dplyr code

2021-05-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-12731:
---

 Summary: [R] Use InMemoryDataset for Table/RecordBatch in dplyr 
code
 Key: ARROW-12731
 URL: https://issues.apache.org/jira/browse/ARROW-12731
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 5.0.0


This lets us consolidate our Expression handling code and prepares us for more 
query evaluation in the near future. As a bonus, it should also simplify our 
dplyr NSE function definition and make it easier to add and test them going 
forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >