[jira] [Created] (ARROW-9112) [R] Update autobrew script location

2020-06-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9112:
--

 Summary: [R] Update autobrew script location
 Key: ARROW-9112
 URL: https://issues.apache.org/jira/browse/ARROW-9112
 Project: Apache Arrow
  Issue Type: Task
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Jeroen is moving it to a different location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9083) [R] collect int64 as R integer type if not out of bounds

2020-06-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9083:
--

 Summary: [R] collect int64 as R integer type if not out of bounds
 Key: ARROW-9083
 URL: https://issues.apache.org/jira/browse/ARROW-9083
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson


{{bit64::integer64}} can be awkward to work with in R (one example: 
https://github.com/apache/arrow/issues/7385). Often in Arrow we get {{int64}} 
types from [compute methods|https://github.com/apache/arrow/pull/7308] or other 
translation methods that auto-promote to the largest integer type, but they 
would fit fine in a 32-bit integer, which is R's native type. 

When calling {{Array__as_vector}} on an int64, we could first call the minmax 
function on the array, and if the extrema are within the range of a 32-bit int, 
return a regular R integer vector. This would add a little bit of ambiguity as 
to what R type you'll get from an Arrow type, but I wonder if the benefits are 
worth it since you can't do much with an integer64 in R. (We could also make 
this optional, similar to ARROW-7657, so you could specify a "strict" mode if 
you are in a use case where roundtrip fidelity is more important than R 
usability.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9070) [C++] StructScalar needs field accessor methods

2020-06-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9070:
--

 Summary: [C++] StructScalar needs field accessor methods
 Key: ARROW-9070
 URL: https://issues.apache.org/jira/browse/ARROW-9070
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


The minmax compute function returns a struct with fields "min" and "max". So to 
write an R binding for the {{min()}} method on arrow objects, I call "minmax" 
and then take the "min" field from the result. However, at least from my 
reading of scalar.h compared with array_nested.h, there are no 
field/GetFieldByName/etc. methods for StructScalar, so I can't get it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9069) [C++] MakeArrayFromScalar can't handle struct

2020-06-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9069:
--

 Summary: [C++] MakeArrayFromScalar can't handle struct
 Key: ARROW-9069
 URL: https://issues.apache.org/jira/browse/ARROW-9069
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


The R bindings translate data to/from Scalars by using the Array methods 
already implemented: to go from R object to a Scalar, it creates a length-1 
Array and then slices out the 0th element with GetScalar(); to go from Scalar 
to R object, it calls MakeArrayFromScalar and then the as.vector method on that 
Array (in R, there is no scalar type anyway, only length-1 vectors). 

This generally works fine but if I get a Struct scalar (as the minmax compute 
function returns), I can't do anything with it because MakeArrayFromScalar 
doesn't work with structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9056) [C++] Aggregation methods for Scalars?

2020-06-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9056:
--

 Summary: [C++] Aggregation methods for Scalars?
 Key: ARROW-9056
 URL: https://issues.apache.org/jira/browse/ARROW-9056
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


See discussion on https://github.com/apache/arrow/pull/7308. Many/most would 
no-op (sum, mean, min, max), but maybe they should exist and not error? Maybe 
they're not needed, but I could see how you might invoke a function on the 
result of a previous aggregation or something.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9055) [C++] Add sum/mean kernels for Boolean type

2020-06-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9055:
--

 Summary: [C++] Add sum/mean kernels for Boolean type
 Key: ARROW-9055
 URL: https://issues.apache.org/jira/browse/ARROW-9055
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


See https://github.com/apache/arrow/pull/7308 (ARROW-6978)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9054) [C++] Add ScalarAggregateOptions

2020-06-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9054:
--

 Summary: [C++] Add ScalarAggregateOptions
 Key: ARROW-9054
 URL: https://issues.apache.org/jira/browse/ARROW-9054
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


See discussion on https://github.com/apache/arrow/pull/7308. MinMax has an 
option for null behavior, but Sum and Mean do not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9046) [C++][R] Put more things in type_fwds

2020-06-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9046:
--

 Summary: [C++][R] Put more things in type_fwds
 Key: ARROW-9046
 URL: https://issues.apache.org/jira/browse/ARROW-9046
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 1.0.0


Hopefully to reduce compile time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8984) [R] Revise install guides now that Windows conda package exists

2020-05-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8984:
--

 Summary: [R] Revise install guides now that Windows conda package 
exists
 Key: ARROW-8984
 URL: https://issues.apache.org/jira/browse/ARROW-8984
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8976) [C++] compute::CallFunction can't Filter/Take with ChunkedArray

2020-05-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8976:
--

 Summary: [C++] compute::CallFunction can't Filter/Take with 
ChunkedArray
 Key: ARROW-8976
 URL: https://issues.apache.org/jira/browse/ARROW-8976
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-8938

{{Invalid: Kernel does not support chunked array arguments}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8899) [R] Add R metadata like pandas metadata for round-trip fidelity

2020-05-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8899:
--

 Summary: [R] Add R metadata like pandas metadata for round-trip 
fidelity
 Key: ARROW-8899
 URL: https://issues.apache.org/jira/browse/ARROW-8899
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


Arrow Schema and Field objects have custom_metadata fields to store arbitrary 
strings in a key-value store. Pandas stores JSON in a "pandas" key and uses 
that to improve the fidelity of round-tripping data to Arrow/Parquet/Feather 
and back. 
https://pandas.pydata.org/docs/dev/development/developer.html#storing-pandas-dataframe-objects-in-apache-parquet-format
 describes this a bit.

You can see this pandas metadata in the sample Parquet file:

{code:r}
tab <- read_parquet(system.file("v0.7.1.parquet", package="arrow"), 
as_data_frame = FALSE)
tab

# Table
# 10 rows x 11 columns
# $carat 
# $cut 
# $color 
# $clarity 
# $depth 
# $table 
# $price 
# $x 
# $y 
# $z 
# $__index_level_0__ 

tab$metadata

# $pandas
# [1] "{\"index_columns\": [\"__index_level_0__\"], \"column_indexes\": 
[{\"name\": null, \"pandas_type\": \"string\", \"numpy_type\": \"object\", 
\"metadata\": null}], \"columns\": [{\"name\": \"carat\", \"pandas_type\": 
\"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": 
\"cut\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", 
\"metadata\": null}, {\"name\": \"color\", \"pandas_type\": \"unicode\", 
\"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"clarity\", 
\"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, 
{\"name\": \"depth\", \"pandas_type\": \"float64\", \"numpy_type\": 
\"float64\", \"metadata\": null}, {\"name\": \"table\", \"pandas_type\": 
\"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": 
\"price\", \"pandas_type\": \"int64\", \"numpy_type\": \"int64\", \"metadata\": 
null}, {\"name\": \"x\", \"pandas_type\": \"float64\", \"numpy_type\": 
\"float64\", \"metadata\": null}, {\"name\": \"y\", \"pandas_type\": 
\"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": 
\"z\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": 
null}, {\"name\": \"__index_level_0__\", \"pandas_type\": \"int64\", 
\"numpy_type\": \"int64\", \"metadata\": null}], \"pandas_version\": 
\"0.20.1\"}"
{code}

We should do something similar in R: store the "attributes" for each column in 
a data.frame when we convert to Arrow, and restore those attributes when we 
read from Arrow. 

Since ARROW-8703, you could naively do this all in R, something like:

{code:r}
tab$metadata$r <- lapply(df, attributes)
{code}

on the conversion to Arrow, and in as.data.frame(), do

{code:r}
if (!is.null(tab$metadata$r)) {
  df[] <- mapply(function(col, meta) {
attributes(col) <- meta
  }, col = df, meta = tab$metadata$r)
}
{code}

However, it's trickier than this because:

* {{tab$metadata$r}} needs to be serialized to string and deserialized on the 
way back. Pandas uses JSON but arrow doesn't currently have a JSON R 
dependency. The C++ build does include rapidjson, maybe we could tap into that? 
Alternatively, we could {{dput()}} to dump the R attributes, which might have 
higher fidelity in addition to zero dependencies, but there are tradeoffs.
* We'll need to do the same for all places where Tables and RecordBatches are 
created/converted
* We'll need to make sure that nested types (structs) get the same coverage
* This metadata only is attached to Schemas, meaning that Arrays/ChunkedArrays 
don't have a place to store extra metadata. So we probably want to attach to 
the R6 (Chunked)Array objects a metadata/attributes field so that if we convert 
an R vector to array, or if we extract an array out of a record batch, we don't 
lose the attributes.

Doing this should resolve ARROW-4390 and make ARROW-8867 trivial as well.

Finally, a note about this custom metadata vs. extension types. Extension types 
can be defined by [adding metadata to a 
Field|https://arrow.apache.org/docs/format/Columnar.html#extension-types] (in a 
Schema). I think this is out of scope here because we're only concerned with R 
roundtrip fidelity. If there were a type that (for example) R and Pandas both 
had that Arrow did not, we could define an extension type so that we could 
share that across the implementations. But unless/until there is value in 
establishing that extension type standard, let's not worry with it. (In other 
words, in R we should ignore pandas metadata; if there's anything that pandas 
wants to share with R, it will define it somewhere else.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8885) [R] Don't include everything everywhere

2020-05-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8885:
--

 Summary: [R] Don't include everything everywhere
 Key: ARROW-8885
 URL: https://issues.apache.org/jira/browse/ARROW-8885
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


I noticed that we were jamming all of our arrow #includes in one header file in 
the R bindings and then including that everywhere. Seemed like that was 
wasteful and probably causing compilation to be slower.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8864) [R] Add methods to Table/RecordBatch for consistency with data.frame

2020-05-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8864:
--

 Summary: [R] Add methods to Table/RecordBatch for consistency with 
data.frame
 Key: ARROW-8864
 URL: https://issues.apache.org/jira/browse/ARROW-8864
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Some methods identified in the Feather package test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8857) [CI] MinGW builds break on system upgrade

2020-05-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8857:
--

 Summary: [CI] MinGW builds break on system upgrade
 Key: ARROW-8857
 URL: https://issues.apache.org/jira/browse/ARROW-8857
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R, Ruby
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See e.g. 
https://github.com/apache/arrow/pull/7218/checks?check_run_id=687127263#step:7:69

Started failing sometime today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8852) [R] Post-0.17.1 adjustments

2020-05-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8852:
--

 Summary: [R] Post-0.17.1 adjustments
 Key: ARROW-8852
 URL: https://issues.apache.org/jira/browse/ARROW-8852
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8826) [Crossbow] remote URL should always have .git

2020-05-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8826:
--

 Summary: [Crossbow] remote URL should always have .git
 Key: ARROW-8826
 URL: https://issues.apache.org/jira/browse/ARROW-8826
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Developer Tools
Reporter: Neal Richardson
Assignee: Neal Richardson


In ARROW-7803, I edited the crossbow templates for the homebrew jobs to 
substitute in the correct fork of arrow and append the current git SHA so that 
the code under test corresponds to the requested git commit. Unfortunately, 
this caused the nightly builds to fail. 

Comparing a successful on-demand run 
(https://github.com/ursa-labs/crossbow/blob/actions-266-travis-homebrew-r-autobrew/.travis.yml)
 with a nightly run 
(https://github.com/ursa-labs/crossbow/blob/nightly-2020-05-16-0-travis-homebrew-cpp/.travis.yml),
 it appears that the default "remote" URL that crossbow uses when not on a 
fork/PR does not contain the ".git" suffix. And I suspect that Homebrew 
requires that in order to identify the source as a git repo in order to clone 
it correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8804) [R][CI] Followup to Rtools40 upgrade

2020-05-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8804:
--

 Summary: [R][CI] Followup to Rtools40 upgrade
 Key: ARROW-8804
 URL: https://issues.apache.org/jira/browse/ARROW-8804
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8768) [R][CI] Fix nightly as-cran spurious failure

2020-05-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8768:
--

 Summary: [R][CI] Fix nightly as-cran spurious failure
 Key: ARROW-8768
 URL: https://issues.apache.org/jira/browse/ARROW-8768
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


An extra check we added to ensure that the package doesn't write anything to 
the user's home directory started failing on one of the 5 as-cran checks. It 
appears that a new feature of texlive2020, which is apparently invoked on 
checking that the pdf manual can be built, adds some caching junk to the home 
dir. It is unlikely that this is a real failure, probably just an artifact of 
the test environment. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8758) [R] Updates for compatibility with dplyr 1.0

2020-05-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8758:
--

 Summary: [R] Updates for compatibility with dplyr 1.0
 Key: ARROW-8758
 URL: https://issues.apache.org/jira/browse/ARROW-8758
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0, 0.17.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8718) [R] Add str() methods to objects

2020-05-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8718:
--

 Summary: [R] Add str() methods to objects
 Key: ARROW-8718
 URL: https://issues.apache.org/jira/browse/ARROW-8718
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Apparently this will make the RStudio IDE show useful things in the environment 
panel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8717) [CI][Packaging] Add build dependency on boost to homebrew

2020-05-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8717:
--

 Summary: [CI][Packaging] Add build dependency on boost to homebrew
 Key: ARROW-8717
 URL: https://issues.apache.org/jira/browse/ARROW-8717
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


cf. https://github.com/Homebrew/homebrew-core/pull/54287

and revise the Travis jobs to uninstall boost and thrift before checking the 
formula



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8699) [R] Fix automatic r_to_py conversion

2020-05-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8699:
--

 Summary: [R] Fix automatic r_to_py conversion
 Key: ARROW-8699
 URL: https://issues.apache.org/jira/browse/ARROW-8699
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See https://github.com/rstudio/reticulate/issues/748



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8624) [Packaging] Linux system packages aren't building with ARROW_DATASET=ON

2020-04-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8624:
--

 Summary: [Packaging] Linux system packages aren't building with 
ARROW_DATASET=ON
 Key: ARROW-8624
 URL: https://issues.apache.org/jira/browse/ARROW-8624
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Affects Versions: 0.17.0
Reporter: Neal Richardson


I've seen a few reports like https://github.com/apache/arrow/issues/7055, where 
the user reports that they've installed the arrow system packages, we can see 
that they exist, but {{pkg-config}} reports that it doesn't have them. I think 
this is because {{-larrow_dataset}} isn't found. As the output on that issue 
shows, while arrow core headers and libraries are there, arrow_dataset is not.

Searching through the packaging scripts (such as 
https://github.com/apache/arrow/blob/master/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in),
 while there is some metadata about a dataset package, I see that 
ARROW_DATASET=ON is not set anywhere, so I don't think we're building it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8607) [R][CI] Unbreak builds following R 4.0 release

2020-04-27 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8607:
--

 Summary: [R][CI] Unbreak builds following R 4.0 release
 Key: ARROW-8607
 URL: https://issues.apache.org/jira/browse/ARROW-8607
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Just a tourniquet to get master passing again while I work on ARROW-8604.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8606) [CI] Don't trigger all builds on a change to any file in ci/

2020-04-27 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8606:
--

 Summary: [CI] Don't trigger all builds on a change to any file in 
ci/
 Key: ARROW-8606
 URL: https://issues.apache.org/jira/browse/ARROW-8606
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8575) [Developer] Add issue_comment workflow to rebase a PR

2020-04-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8575:
--

 Summary: [Developer] Add issue_comment workflow to rebase a PR
 Key: ARROW-8575
 URL: https://issues.apache.org/jira/browse/ARROW-8575
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8569) [CI] Upgrade xcode version for testing homebrew formulae

2020-04-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8569:
--

 Summary: [CI] Upgrade xcode version for testing homebrew formulae
 Key: ARROW-8569
 URL: https://issues.apache.org/jira/browse/ARROW-8569
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


To prevent as many bottles from being built from source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8550) [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8550:
--

 Summary: [CI] Don't run cron GHA jobs on forks
 Key: ARROW-8550
 URL: https://issues.apache.org/jira/browse/ARROW-8550
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson


It's wasteful, and I'm tired of seeing them clogging up my Actions tab and 
notifications. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8549) [R] Assorted post-0.17 release cleanups

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8549:
--

 Summary: [R] Assorted post-0.17 release cleanups
 Key: ARROW-8549
 URL: https://issues.apache.org/jira/browse/ARROW-8549
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8548) [Website] 0.17 release post

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8548:
--

 Summary: [Website] 0.17 release post
 Key: ARROW-8548
 URL: https://issues.apache.org/jira/browse/ARROW-8548
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8538) [Packaging] Remove boost from homebrew formula

2020-04-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8538:
--

 Summary: [Packaging] Remove boost from homebrew formula
 Key: ARROW-8538
 URL: https://issues.apache.org/jira/browse/ARROW-8538
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8489) [Developer] Autotune more things

2020-04-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8489:
--

 Summary: [Developer] Autotune more things
 Key: ARROW-8489
 URL: https://issues.apache.org/jira/browse/ARROW-8489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Python
Reporter: Neal Richardson


ARROW-7801 added the "autotune" comment bot to fix linting errors and rebuild 
some generated files. cmake-format was left off because of Python problems (see 
description on https://github.com/apache/arrow/pull/6932). And there's probably 
other things we want to add (autopep8 for python, and similar for other 
languages?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8475) [CI][Crossbow] Rehabilitate (or delete) hiveserver2 nightly job

2020-04-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8475:
--

 Summary: [CI][Crossbow] Rehabilitate (or delete) hiveserver2 
nightly job
 Key: ARROW-8475
 URL: https://issues.apache.org/jira/browse/ARROW-8475
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson


Disabled in ARROW-8474 cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8474) [CI][Crossbow] Skip some nightlies we don't need to run

2020-04-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8474:
--

 Summary: [CI][Crossbow] Skip some nightlies we don't need to run
 Key: ARROW-8474
 URL: https://issues.apache.org/jira/browse/ARROW-8474
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8449) [R] Use CMAKE_UNITY_BUILD everywhere

2020-04-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8449:
--

 Summary: [R] Use CMAKE_UNITY_BUILD everywhere
 Key: ARROW-8449
 URL: https://issues.apache.org/jira/browse/ARROW-8449
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8433) [R] Add feather alias for ipc format in dataset API

2020-04-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8433:
--

 Summary: [R] Add feather alias for ipc format in dataset API
 Key: ARROW-8433
 URL: https://issues.apache.org/jira/browse/ARROW-8433
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


cf. ARROW-8416



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8390) [R] Expose schema unification features

2020-04-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8390:
--

 Summary: [R] Expose schema unification features
 Key: ARROW-8390
 URL: https://issues.apache.org/jira/browse/ARROW-8390
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2020-04-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8379:
--

 Summary: [R] Investigate/fix thread safety issues (esp. Windows)
 Key: ARROW-8379
 URL: https://issues.apache.org/jira/browse/ARROW-8379
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson


There have been a number of issues where the R bindings' multithreading has 
been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 I 
disabled {{use_threads}} in the Windows tests, and it appeared that the 
mysterious Windows segfaults stopped. We should fix whatever the underlying 
issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8377) [CI][C++][R] Build and run C++ tests on Rtools build

2020-04-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8377:
--

 Summary: [CI][C++][R] Build and run C++ tests on Rtools build
 Key: ARROW-8377
 URL: https://issues.apache.org/jira/browse/ARROW-8377
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Continuous Integration, R
Reporter: Neal Richardson


Maybe this will better identify our unexplained segfaults



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8376) [R] Add experimental interface to ScanTask/RecordBatch iterators

2020-04-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8376:
--

 Summary: [R] Add experimental interface to ScanTask/RecordBatch 
iterators
 Key: ARROW-8376
 URL: https://issues.apache.org/jira/browse/ARROW-8376
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8375) [CI][R] Make Windows tests more verbose in case of segfault

2020-04-08 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8375:
--

 Summary: [CI][R] Make Windows tests more verbose in case of 
segfault
 Key: ARROW-8375
 URL: https://issues.apache.org/jira/browse/ARROW-8375
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8369) [CI] Fix crossbow R group

2020-04-07 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8369:
--

 Summary: [CI] Fix crossbow R group
 Key: ARROW-8369
 URL: https://issues.apache.org/jira/browse/ARROW-8369
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson


This was broken in ARROW-8356



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8353) [C++] is_nullable maybe not initialized in parquet writer

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8353:
--

 Summary: [C++] is_nullable maybe not initialized in parquet writer
 Key: ARROW-8353
 URL: https://issues.apache.org/jira/browse/ARROW-8353
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson


>From the Rtools build:

{code}
[ 84%] Building CXX object 
src/parquet/CMakeFiles/parquet_static.dir/column_reader.cc.obj
In file included from D:/a/arrow/arrow/cpp/src/arrow/io/concurrency.h:23:0,
 from D:/a/arrow/arrow/cpp/src/arrow/io/memory.h:25,
 from D:/a/arrow/arrow/cpp/src/parquet/platform.h:25,
 from D:/a/arrow/arrow/cpp/src/parquet/arrow/writer.h:23,
 from D:/a/arrow/arrow/cpp/src/parquet/arrow/writer.cc:18:
D:/a/arrow/arrow/cpp/src/arrow/result.h: In member function 'virtual 
arrow::Status parquet::arrow::FileWriterImpl::WriteColumnChunk(const 
std::shared_ptr&, int64_t, int64_t)':
D:/a/arrow/arrow/cpp/src/arrow/result.h:428:28: warning: 'is_nullable' may be 
used uninitialized in this function [-Wmaybe-uninitialized]
   auto result_name = (rexpr);   \
^
D:/a/arrow/arrow/cpp/src/parquet/arrow/writer.cc:430:10: note: 'is_nullable' 
was declared here
 bool is_nullable;
  ^
{code}

I'd give it a default value, but IDK that it's that simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8352) [R] Add install_pyarrow()

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8352:
--

 Summary: [R] Add install_pyarrow()
 Key: ARROW-8352
 URL: https://issues.apache.org/jira/browse/ARROW-8352
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Neal Richardson
Assignee: Neal Richardson


To facilitate installing for use with reticulate, including handling how to use 
the nightly packages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8351) [R][CI] Store the Rtools-built Arrow C++ library as a build artifact

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8351:
--

 Summary: [R][CI] Store the Rtools-built Arrow C++ library as a 
build artifact
 Key: ARROW-8351
 URL: https://issues.apache.org/jira/browse/ARROW-8351
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Neal Richardson
Assignee: Neal Richardson


To help with debugging unexplained segfaults.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8346) [CI][Ruby] GLib/Ruby macOS build fails on zlib

2020-04-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8346:
--

 Summary: [CI][Ruby] GLib/Ruby macOS build fails on zlib
 Key: ARROW-8346
 URL: https://issues.apache.org/jira/browse/ARROW-8346
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, GLib
Reporter: Neal Richardson
 Fix For: 0.17.0


See https://github.com/apache/arrow/runs/564610412 for example.

{code}
Using 'PKG_CONFIG_PATH' from environment with value: '/usr/local/lib/pkgconfig'
Run-time dependency gobject-2.0 found: YES 2.64.1
Run-time dependency gio-2.0 found: NO (tried framework and cmake)

c_glib/arrow-glib/meson.build:210:0: ERROR: Could not generate cargs for 
gio-2.0:
Package zlib was not found in the pkg-config search path.
Perhaps you should add the directory containing `zlib.pc'
to the PKG_CONFIG_PATH environment variable
Package 'zlib', required by 'gio-2.0', not found


A full log can be found at 
/Users/runner/runners/2.168.0/work/arrow/arrow/build/c_glib/meson-logs/meson-log.txt
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8337) [Release] Verify release candidate wheels without using conda

2020-04-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8337:
--

 Summary: [Release] Verify release candidate wheels without using 
conda
 Key: ARROW-8337
 URL: https://issues.apache.org/jira/browse/ARROW-8337
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Neal Richardson


See final comments on ARROW-2880



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8335) [Release] Add crossbow jobs to run release verification

2020-04-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8335:
--

 Summary: [Release] Add crossbow jobs to run release verification
 Key: ARROW-8335
 URL: https://issues.apache.org/jira/browse/ARROW-8335
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


Workflow: edit version number and rc number in template in 
{{dev/release/github.verify.yml}}, make PR, and do 

* {{@github-actions crossbow submit -g verify-rc}} to run everything
* {{@github-actions crossbow submit -g verify-rc-wheel|source|binary}} to run 
those groups
* Other groups at {{verify-rc-wheel|source-macos|ubuntu|windows}}, 
{{verify-rc-source-cpp|csharp|java|etc.}}
* Individual workflows at e.g. {{verify-rc-wheel-windows}}, 
{{verify-rc-source-macos-csharp}}. We could break out the wheel verification by 
python version (maybe we should), but that requires changes to the verification 
scripts themselves.

Running the main {{verify-rc}} group will put a ton of workflow svg badges on 
the PR so we can see at a glance what is passing and failing. If things fail 
when running all, can push fixes to the verification script to the branch and 
retry just those that failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8325) [R][CI] Stop including boost in R windows bundle

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8325:
--

 Summary: [R][CI] Stop including boost in R windows bundle
 Key: ARROW-8325
 URL: https://issues.apache.org/jira/browse/ARROW-8325
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8324) [R] Add read/write_ipc_file separate from _feather

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8324:
--

 Summary: [R] Add read/write_ipc_file separate from _feather
 Key: ARROW-8324
 URL: https://issues.apache.org/jira/browse/ARROW-8324
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson


See [https://github.com/apache/arrow/pull/6771#issuecomment-608133760]
{quote}Let's add read/write_ipc_file also? I'm wary of the "version" option in 
"write_feather" and the Feather version inference capability in "read_feather". 
It's potentially confusing and we may choose to add options to 
write_ipc_file/read_ipc_file that are more developer centric, having to do with 
particulars in the IPC format, that are not relevant or appropriate for the 
Feather APIs.

IMHO it's best for "Feather format" to remain an abstracted higher-level 
concept with its use of the "IPC file format" as an implementation detail, and 
segregated from the other things.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8309) [CI] C++/Java/Rust workflows should trigger on changes to Flight.proto

2020-04-01 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8309:
--

 Summary: [CI] C++/Java/Rust workflows should trigger on changes to 
Flight.proto
 Key: ARROW-8309
 URL: https://issues.apache.org/jira/browse/ARROW-8309
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


The Flight DoExchange format change caused Rust build failures (ARROW-8308). We 
would have caught these in the format change patch, but the Rust builds weren't 
triggered on changes to {{format/Flight.proto}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8301) [C++][Python][R] Handle ChunkedArray and Table in C data interface

2020-03-31 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8301:
--

 Summary: [C++][Python][R] Handle ChunkedArray and Table in C data 
interface
 Key: ARROW-8301
 URL: https://issues.apache.org/jira/browse/ARROW-8301
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C, C++, Python, R
Reporter: Neal Richardson
Assignee: Antoine Pitrou


Currently the C data interface does Array and RecordBatch, but we're also going 
to need ChunkedArray and Table. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8300) [R] Documentation and changelog updates for 0.17

2020-03-31 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8300:
--

 Summary: [R] Documentation and changelog updates for 0.17
 Key: ARROW-8300
 URL: https://issues.apache.org/jira/browse/ARROW-8300
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8266) [C++] Add backup mirrors for external project source downloads

2020-03-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8266:
--

 Summary: [C++] Add backup mirrors for external project source 
downloads
 Key: ARROW-8266
 URL: https://issues.apache.org/jira/browse/ARROW-8266
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


As we've seen a number of times, most recently with boost, our builds sometimes 
fail because of a failure to download bundled dependencies. To reduce this 
risk, we can add alternate URLs to the cmake externalprojects, so that it will 
attempt to download from the second location if the first fails 
(https://cmake.org/cmake/help/latest/module/ExternalProject.html). This feature 
is available in cmake >=3.7. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8222) [C++] Use bcp to make a slim boost for bundled build

2020-03-25 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8222:
--

 Summary: [C++] Use bcp to make a slim boost for bundled build
 Key: ARROW-8222
 URL: https://issues.apache.org/jira/browse/ARROW-8222
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson


We don't use much of Boost (just system, filesystem, and regex), but when we do 
a bundled build, we still download and extract all of boost. The tarball itself 
is 113mb, expanded is over 700mb. This can be slow, and it requires a lot of 
free disk space that we don't really need.

[bcp|https://www.boost.org/doc/libs/1_72_0/tools/bcp/doc/html/index.html] is a 
boost tool that lets you extract a subset of boost, resolving any of its 
necessary dependencies across boost. The savings for us could be huge:

{code}
mkdir test
./bcp system.hpp filesystem.hpp regex.hpp test
tar -czf test.tar.gz test/
{code}

The resulting tarball is 885K (kilobytes!). 

{{bcp}} also lets you re-namespace, so this would (IIUC) solve ARROW-4286 as 
well.

We would need a place to host this tarball, and we would have to updated it 
whenever we (1) bump the boost version or (2) add a new boost library 
dependency. This patch would of course include a script that would generate the 
tarball. Given the small size, we could also consider just vendoring it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8206) [R] Minor fix for backwards compatibility on Linux installation

2020-03-24 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8206:
--

 Summary: [R] Minor fix for backwards compatibility on Linux 
installation
 Key: ARROW-8206
 URL: https://issues.apache.org/jira/browse/ARROW-8206
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


In 0.16, the recommendation was to set {{LIBARROW_DOWNLOAD=true}} to install 
with dependencies, and this would include getting a binary. But the recent 
refactor to Linux installation didn't carry this setting forward correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8188) [R] Adapt to latest checks in R-devel

2020-03-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8188:
--

 Summary: [R] Adapt to latest checks in R-devel
 Key: ARROW-8188
 URL: https://issues.apache.org/jira/browse/ARROW-8188
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


See https://github.com/ursa-labs/crossbow/runs/526813242 for example.

1. checkbashisms now is complaining about a few things
2. Latest R-devel actually runs the donttest examples with --as-cran, and one 
fails. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8187) [R] Make test assertions robust to i18n

2020-03-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8187:
--

 Summary: [R] Make test assertions robust to i18n
 Key: ARROW-8187
 URL: https://issues.apache.org/jira/browse/ARROW-8187
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Antoine Pitrou
Assignee: Neal Richardson
 Fix For: 0.17.0


{code}
── 1. Failure: codec_is_available (@test-compressed.R#22)  ─
`codec_is_available("sdfasdf")` threw an error with unexpected message.
Expected match: "'arg' should be one of"
Actual message: "'arg' doit être un de “UNCOMPRESSED”, “SNAPPY”, “GZIP”, 
“BROTLI”, “ZSTD”, “LZ4”, “LZO”, “BZ2”"
Backtrace:
  1. testthat::expect_error(codec_is_available("sdfasdf"), "'arg' should be one 
of") testthat/test-compressed.R:22:2
  6. arrow::codec_is_available("sdfasdf")
  8. arrow:::compression_from_name(type)
  9. purrr::map_int(...)
 10. arrow:::.f(.x[[i]], ...)
 11. base::match.arg(toupper(.x), names(CompressionType))

── 2. Failure: time type unit validation (@test-data-type.R#298)  ──
`time32("years")` threw an error with unexpected message.
Expected match: "'arg' should be one of"
Actual message: "'arg' doit être un de “ms”, “s”"
Backtrace:
 1. testthat::expect_error(time32("years"), "'arg' should be one of") 
testthat/test-data-type.R:298:2
 6. arrow::time32("years")
 7. base::match.arg(unit)

── 3. Failure: time type unit validation (@test-data-type.R#305)  ──
`time64("years")` threw an error with unexpected message.
Expected match: "'arg' should be one of"
Actual message: "'arg' doit être un de “ns”, “us”"
Backtrace:
 1. testthat::expect_error(time64("years"), "'arg' should be one of") 
testthat/test-data-type.R:305:2
 6. arrow::time64("years")
 7. base::match.arg(unit)

── 4. Failure: decimal type and validation (@test-data-type.R#387)  
`decimal()` threw an error with unexpected message.
Expected match: "argument \"precision\" is missing, with no default"
Actual message: "l'argument \"precision\" est manquant, avec aucune valeur par 
défaut"
Backtrace:
 1. testthat::expect_error(decimal(), "argument \"precision\" is missing, with 
no default") testthat/test-data-type.R:387:2
 6. arrow::decimal()

── 5. Failure: decimal type and validation (@test-data-type.R#389)  
`decimal(4)` threw an error with unexpected message.
Expected match: "argument \"scale\" is missing, with no default"
Actual message: "l'argument \"scale\" est manquant, avec aucune valeur par 
défaut"
Backtrace:
 1. testthat::expect_error(decimal(4), "argument \"scale\" is missing, with no 
default") testthat/test-data-type.R:389:2
 6. arrow::decimal(4)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8139) [C++] FileSystem enum causes attributes warning

2020-03-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8139:
--

 Summary: [C++] FileSystem enum causes attributes warning
 Key: ARROW-8139
 URL: https://issues.apache.org/jira/browse/ARROW-8139
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


See e.g. 
https://github.com/apache/arrow/runs/512427577?check_suite_focus=true#step:7:996

{code}
In file included from 
/arrow/r/check/arrow.Rcheck/00_pkg_src/arrow/libarrow/arrow-0.16.0.9000/include/arrow/dataset/discovery.h:31:0,
 from 
/arrow/r/check/arrow.Rcheck/00_pkg_src/arrow/libarrow/arrow-0.16.0.9000/include/arrow/dataset/api.h:21,
 from ./arrow_types.h:203,
 from array_to_vector.cpp:18:
/arrow/r/check/arrow.Rcheck/00_pkg_src/arrow/libarrow/arrow-0.16.0.9000/include/arrow/filesystem/filesystem.h:65:1:
 warning: type attributes ignored after type is already defined [-Wattributes]
{code}

This isn't new but I've been staring at the R Linux builds a lot and wanted to 
clean this up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8103) [R] Make default Linux build more minimal

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8103:
--

 Summary: [R] Make default Linux build more minimal
 Key: ARROW-8103
 URL: https://issues.apache.org/jira/browse/ARROW-8103
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Packaging, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


So that we can build on CRAN as quickly as possible, and thus make the default 
experience for users installing the package better--no environment variable 
required to get something functional.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8095) [CI][Crossbow] Nightly turbodbc job fails

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8095:
--

 Summary: [CI][Crossbow] Nightly turbodbc job fails
 Key: ARROW-8095
 URL: https://issues.apache.org/jira/browse/ARROW-8095
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Neal Richardson
 Fix For: 0.17.0


Turbodbc fails to compile (both "master" and "latest" versions with this error):

{code}
FAILED: 
cpp/turbodbc_arrow/Library/CMakeFiles/turbodbc_arrow_support.dir/src/arrow_result_set.cpp.o
 
/opt/conda/envs/arrow/bin/x86_64-conda_cos6-linux-gnu-c++  
-Dturbodbc_arrow_support_EXPORTS -I/turbodbc/cpp/turbodbc_arrow/Library 
-I/turbodbc/cpp/turbodbc_arrow/../cpp_odbc/Library 
-I/turbodbc/cpp/turbodbc_arrow/../turbodbc/Library -I/turbodbc/pybind11/include 
-isystem /opt/conda/envs/arrow/include -isystem 
/opt/conda/envs/arrow/include/python3.7m -isystem 
/opt/conda/envs/arrow/lib/python3.7/site-packages/numpy/core/include 
-fvisibility-inlines-hidden -Wall -Wextra -g -O0 -pedantic -fPIC 
-fvisibility=hidden   -std=c++11 -std=c++14 -MD -MT 
cpp/turbodbc_arrow/Library/CMakeFiles/turbodbc_arrow_support.dir/src/arrow_result_set.cpp.o
 -MF 
cpp/turbodbc_arrow/Library/CMakeFiles/turbodbc_arrow_support.dir/src/arrow_result_set.cpp.o.d
 -o 
cpp/turbodbc_arrow/Library/CMakeFiles/turbodbc_arrow_support.dir/src/arrow_result_set.cpp.o
 -c /turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp: In member 
function 'arrow::Status 
turbodbc_arrow::{anonymous}::StringDictionaryBuilderProxy::AppendProxy(const 
char*, int32_t)':
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:67:36: error: no 
matching function for call to 
'turbodbc_arrow::{anonymous}::StringDictionaryBuilderProxy::Append(const 
char*&, int32_t&)'
 return Append(value, length);
^
In file included from /opt/conda/envs/arrow/include/arrow/builder.h:26:0,
 from /opt/conda/envs/arrow/include/arrow/api.h:26,
 from 
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:6:
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:143:10: note: 
candidate: arrow::Status arrow::internal::DictionaryBuilderBase::Append(const Scalar&) [with BuilderType = arrow::AdaptiveIntBuilder; T = 
arrow::StringType; arrow::internal::DictionaryBuilderBase::Scalar = nonstd::sv_lite::basic_string_view]
   Status Append(const Scalar& value) {
  ^~
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:143:10: note:   
candidate expects 1 argument, 2 provided
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:156:43: note: 
candidate: template arrow::enable_if_fixed_size_binary arrow::internal::DictionaryBuilderBase::Append(const uint8_t*) [with T1 = T1; BuilderType = 
arrow::AdaptiveIntBuilder; T = arrow::StringType]
   enable_if_fixed_size_binary Append(const uint8_t* value) {
   ^~
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:156:43: note:   
template argument deduction/substitution failed:
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:67:36: note:   
candidate expects 1 argument, 2 provided
 return Append(value, length);
^
In file included from /opt/conda/envs/arrow/include/arrow/builder.h:26:0,
 from /opt/conda/envs/arrow/include/arrow/api.h:26,
 from 
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:6:
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:162:43: note: 
candidate: template arrow::enable_if_fixed_size_binary arrow::internal::DictionaryBuilderBase::Append(const char*) [with T1 = T1; BuilderType = arrow::AdaptiveIntBuilder; 
T = arrow::StringType]
   enable_if_fixed_size_binary Append(const char* value) {
   ^~
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:162:43: note:   
template argument deduction/substitution failed:
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:67:36: note:   
candidate expects 1 argument, 2 provided
 return Append(value, length);
^
In file included from /opt/conda/envs/arrow/include/arrow/builder.h:26:0,
 from /opt/conda/envs/arrow/include/arrow/api.h:26,
 from 
/turbodbc/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp:6:
/opt/conda/envs/arrow/include/arrow/array/builder_dict.h:168:37: note: 
candidate: template arrow::enable_if_binary_like 
arrow::internal::DictionaryBuilderBase::Append(const uint8_t*, 
int32_t) [with T1 = T1; BuilderType = arrow::AdaptiveIntBuilder; T = 
arrow::StringType]
   enable_if_binary_like Append(const uint8_t* value, int32_t 
length) {
 ^~

[jira] [Created] (ARROW-8094) [CI][Crossbow] Nightly valgrind test fails

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8094:
--

 Summary: [CI][Crossbow] Nightly valgrind test fails
 Key: ARROW-8094
 URL: https://issues.apache.org/jira/browse/ARROW-8094
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Neal Richardson
 Fix For: 0.17.0


See https://circleci.com/gh/ursa-labs/crossbow/9162



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8093) [CI][Crossbow] Pandas integration test fails

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8093:
--

 Summary: [CI][Crossbow] Pandas integration test fails
 Key: ARROW-8093
 URL: https://issues.apache.org/jira/browse/ARROW-8093
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Neal Richardson
Assignee: Joris Van den Bossche
 Fix For: 0.17.0


{code}
=== FAILURES ===
___ test_conversion_extensiontype_to_extensionarray 

monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f029f03f2a0>

def test_conversion_extensiontype_to_extensionarray(monkeypatch):
# converting extension type to linked pandas ExtensionDtype/Array
import pandas.core.internals as _int

storage = pa.array([1, 2, 3, 4], pa.int64())
arr = pa.ExtensionArray.from_storage(MyCustomIntegerType(), storage)
table = pa.table({'a': arr})

if LooseVersion(pd.__version__) < "0.26.0.dev":
# ensure pandas Int64Dtype has the protocol method (for older 
pandas)
monkeypatch.setattr(
pd.Int64Dtype, '__from_arrow__', _Int64Dtype__from_arrow__,
raising=False)

# extension type points to Int64Dtype, which knows how to create a
# pandas ExtensionArray
>   result = table.to_pandas()

opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/tests/test_pandas.py:3633:
 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/array.pxi:566: in pyarrow.lib._PandasConvertible.to_pandas
???
pyarrow/table.pxi:1425: in pyarrow.lib.Table._to_pandas
???
opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/pandas_compat.py:764: 
in table_to_blockmanager
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1102: 
in _table_to_blocks
for item in result]
opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/pandas_compat.py:1102: 
in 
for item in result]
opt/conda/envs/arrow/lib/python3.7/site-packages/pyarrow/pandas_compat.py:723: 
in _reconstruct_block
pd_ext_arr = pandas_dtype.__from_arrow__(arr)
opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/arrays/integer.py:108:
 in __from_arrow__
array = array.cast(pyarrow_type)
pyarrow/table.pxi:240: in pyarrow.lib.ChunkedArray.cast
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowNotImplementedError: No cast implemented from 
extension to int64
{code}

https://circleci.com/gh/ursa-labs/crossbow/9156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8092) [CI][Crossbow] OSX wheels fail on bundled bzip2

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8092:
--

 Summary: [CI][Crossbow] OSX wheels fail on bundled bzip2
 Key: ARROW-8092
 URL: https://issues.apache.org/jira/browse/ARROW-8092
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Packaging, Python
Reporter: Neal Richardson
 Fix For: 0.17.0


See e.g. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8091) [CI][Crossbow] Fix nightly homebrew and R failures

2020-03-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8091:
--

 Summary: [CI][Crossbow] Fix nightly homebrew and R failures
 Key: ARROW-8091
 URL: https://issues.apache.org/jira/browse/ARROW-8091
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.17.0


R: 
[https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=8156=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181=127]

Homebrew: 
[https://travis-ci.org/github/ursa-labs/crossbow/builds/661245549#L3392] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8025) [C++] Implement cast to Binary and FixedSizeBinary

2020-03-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8025:
--

 Summary: [C++] Implement cast to Binary and FixedSizeBinary
 Key: ARROW-8025
 URL: https://issues.apache.org/jira/browse/ARROW-8025
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Compute
Reporter: Neal Richardson


It appears you can cast from Binary to String but not the other way. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8024) [R] Bindings for BinaryType and FixedBinaryType

2020-03-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8024:
--

 Summary: [R] Bindings for BinaryType and FixedBinaryType
 Key: ARROW-8024
 URL: https://issues.apache.org/jira/browse/ARROW-8024
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Prerequisite for ARROW-6235 (converting BinaryArray data to R). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8002) [C++][Dataset] Dataset writing should let you (re)partition the data

2020-03-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8002:
--

 Summary: [C++][Dataset] Dataset writing should let you 
(re)partition the data
 Key: ARROW-8002
 URL: https://issues.apache.org/jira/browse/ARROW-8002
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset, Python, R
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8001) [C++][Dataset] R and Python bindings for dataset writing

2020-03-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8001:
--

 Summary: [C++][Dataset] R and Python bindings for dataset writing
 Key: ARROW-8001
 URL: https://issues.apache.org/jira/browse/ARROW-8001
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset, Python, R
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7988) [R] Fix on.exit calls in reticulate bindings

2020-03-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7988:
--

 Summary: [R] Fix on.exit calls in reticulate bindings
 Key: ARROW-7988
 URL: https://issues.apache.org/jira/browse/ARROW-7988
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7987) [CI][R] Fix for verbose nightly builds

2020-03-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7987:
--

 Summary: [CI][R] Fix for verbose nightly builds
 Key: ARROW-7987
 URL: https://issues.apache.org/jira/browse/ARROW-7987
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-7983



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7984) [R] Check for valid inputs in more places

2020-03-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7984:
--

 Summary: [R] Check for valid inputs in more places
 Key: ARROW-7984
 URL: https://issues.apache.org/jira/browse/ARROW-7984
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


In trying to reproduce bug reports, I typically hit code paths I don't usually 
use, and I often give some input that I expect should work and instead cause a 
segfault. That's no good.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7983) [CI][R] Nightly builds should be more verbose when they fail

2020-03-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7983:
--

 Summary: [CI][R] Nightly builds should be more verbose when they 
fail
 Key: ARROW-7983
 URL: https://issues.apache.org/jira/browse/ARROW-7983
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7967) [CI][Crossbow] Move autobrew job back to old macOS

2020-02-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7967:
--

 Summary: [CI][Crossbow] Move autobrew job back to old macOS
 Key: ARROW-7967
 URL: https://issues.apache.org/jira/browse/ARROW-7967
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson


Followup to ARROW-7923. After hopefully fixing the underlying issue somewhere 
in Travis, revert the changes in that issue so that we're still testing on old 
macOS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7962) [R][Dataset] Followup to "Consolidate Source and Dataset classes"

2020-02-28 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7962:
--

 Summary: [R][Dataset] Followup to "Consolidate Source and Dataset 
classes"
 Key: ARROW-7962
 URL: https://issues.apache.org/jira/browse/ARROW-7962
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Dataset, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


This was pushed to ARROW-7886 but it got dropped in a force push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7923) [CI][Crossbow] macOS autobrew fails on homebrew-versions

2020-02-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7923:
--

 Summary: [CI][Crossbow] macOS autobrew fails on homebrew-versions
 Key: ARROW-7923
 URL: https://issues.apache.org/jira/browse/ARROW-7923
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Packaging, R
Reporter: Neal Richardson
 Fix For: 1.0.0


See e.g. https://travis-ci.org/ursa-labs/crossbow/builds/653768049#L97. 
According to https://github.com/Homebrew/brew/issues/5734, there needs to be 
{{brew untap homebrew-versions}} before {{brew update}}, except this is 
happening in the Travis workflow in the setup stage, so we can't. Will need to 
change the travis-build config or base image upstream, or look for a different 
workaround.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7922) [CI][Crossbow] Nightly conda osx builds fail (brew bundle)

2020-02-23 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7922:
--

 Summary: [CI][Crossbow] Nightly conda osx builds fail (brew bundle)
 Key: ARROW-7922
 URL: https://issues.apache.org/jira/browse/ARROW-7922
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Packaging
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See e.g. https://travis-ci.org/ursa-labs/crossbow/builds/653768373#L129. 
Apparently a new Homebrew release changed some dependency of {{brew bundle}} so 
we need to be sure to {{brew update}} first: 
https://travis-ci.community/t/macos-build-fails-because-of-homebrew-bundle-unknown-command/7296/6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7920) [R] Fill in some missing input validation

2020-02-22 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7920:
--

 Summary: [R] Fill in some missing input validation
 Key: ARROW-7920
 URL: https://issues.apache.org/jira/browse/ARROW-7920
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


I hit some segfaults trying to reproduce an issue because of missing input 
validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7919) [R] install_arrow() should conda install if appropriate

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7919:
--

 Summary: [R] install_arrow() should conda install if appropriate
 Key: ARROW-7919
 URL: https://issues.apache.org/jira/browse/ARROW-7919
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Like, check {{if (grepl("conda", R.Version()$platform))}} and if so then 
{{system("conda install ...")}}. Error if nightly == TRUE because we don't host 
conda nightlies yet.

This would help with issues like https://github.com/apache/arrow/issues/6448



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7918) [R] Improve instructions for conda users in installation vignette

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7918:
--

 Summary: [R] Improve instructions for conda users in installation 
vignette 
 Key: ARROW-7918
 URL: https://issues.apache.org/jira/browse/ARROW-7918
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7913) [C++][Python][R] C++ implementation of C data protocol

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7913:
--

 Summary: [C++][Python][R] C++ implementation of C data protocol
 Key: ARROW-7913
 URL: https://issues.apache.org/jira/browse/ARROW-7913
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python, R
Affects Versions: 1.0.0
Reporter: Neal Richardson
Assignee: Antoine Pitrou


See ARROW-7912



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7912) [Format] C data interface

2020-02-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7912:
--

 Summary: [Format] C data interface
 Key: ARROW-7912
 URL: https://issues.apache.org/jira/browse/ARROW-7912
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Affects Versions: 1.0.0
Reporter: Neal Richardson
Assignee: Antoine Pitrou


Apache Arrow is designed to be a universal in-memory format for the 
representation
of tabular ("columnar") data. However, some projects may face a difficult
choice between either depending on a fast-evolving project such as the
Arrow C++ library, or having to reimplement adapters for data interchange,
which may require significant, redundant development effort.

The Arrow C data interface defines a very small, stable set of C definitions
that can be easily *copied* in any project's source code and used for columnar
data interchange in the Arrow format.  For non-C/C++ languages and runtimes,
it should be almost as easy to translate the C definitions into the
corresponding C FFI declarations.

Applications and libraries can therefore work with Arrow memory without
necessarily using Arrow libraries or reinventing the wheel. Developers can
choose between tight integration
with the Arrow *software project* (benefitting from the growing array of
facilities exposed by e.g. the C++ or Java implementations of Apache Arrow,
but with the cost of a dependency) or minimal integration with the Arrow
*format* only.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7902) [Integration] Unskip nested dictionary integration tests

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7902:
--

 Summary: [Integration] Unskip nested dictionary integration tests
 Key: ARROW-7902
 URL: https://issues.apache.org/jira/browse/ARROW-7902
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7901) [Integration][Go] Add null type (and integration test)

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7901:
--

 Summary: [Integration][Go] Add null type (and integration test)
 Key: ARROW-7901
 URL: https://issues.apache.org/jira/browse/ARROW-7901
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go, Integration
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7900) [Integration][JavaScript] Add null type integration test

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7900:
--

 Summary: [Integration][JavaScript] Add null type integration test
 Key: ARROW-7900
 URL: https://issues.apache.org/jira/browse/ARROW-7900
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, JavaScript
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7899) [Integration][Java] null type integration test

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7899:
--

 Summary: [Integration][Java] null type integration test
 Key: ARROW-7899
 URL: https://issues.apache.org/jira/browse/ARROW-7899
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration, Java
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7895) [Python] Remove more python 2.7 cruft

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7895:
--

 Summary: [Python] Remove more python 2.7 cruft
 Key: ARROW-7895
 URL: https://issues.apache.org/jira/browse/ARROW-7895
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7891) [C++] RecordBatch->Equals should also have a check_metadata argument

2020-02-19 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7891:
--

 Summary: [C++] RecordBatch->Equals should also have a 
check_metadata argument
 Key: ARROW-7891
 URL: https://issues.apache.org/jira/browse/ARROW-7891
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-7720 and ARROW-7786. Table and Schema both have it, so it 
stands to reason that RecordBatch should too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7881) [C++] Fix pedantic warnings

2020-02-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7881:
--

 Summary: [C++] Fix pedantic warnings
 Key: ARROW-7881
 URL: https://issues.apache.org/jira/browse/ARROW-7881
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Saw this while working on ARROW-7880:

{code}
In file included from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/compute/kernel.h:27,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/compute/api.h:22,
 from ./arrow_types.h:199,
 from chunkedarray.cpp:18:
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/scalar.h:399:2: warning: 
extra ‘;’ [-Wpedantic]
 };  // namespace internal
  ^
In file included from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/compute/api.h:31,
 from ./arrow_types.h:199,
 from chunkedarray.cpp:18:
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/compute/kernels/mean.h:66:2: 
warning: extra ‘;’ [-Wpedantic]
 };  // namespace arrow
  ^
In file included from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/dataset/file_base.h:29,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/dataset/api.h:22,
 from ./arrow_types.h:201,
 from chunkedarray.cpp:18:
/arrow/r/libarrow/arrow-0.16.0.9000/include/arrow/dataset/scanner.h:40:2: 
warning: extra ‘;’ [-Wpedantic]
 };
  ^
In file included from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/encryption.h:28,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/properties.h:29,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/metadata.h:29,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/file_reader.h:26,
 from 
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/arrow/reader.h:25,
 from ./arrow_types.h:217,
 from chunkedarray.cpp:18:
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:319:36: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Boolean, BOOLEAN);
^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:320:32: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Int32, INT32);
^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:321:32: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Int64, INT64);
^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:322:32: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Int96, INT96);
^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:323:32: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Float, FLOAT);
^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:324:34: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(Double, DOUBLE);
  ^
/arrow/r/libarrow/arrow-0.16.0.9000/include/parquet/schema.h:325:41: warning: 
extra ‘;’ [-Wpedantic]
 PRIMITIVE_FACTORY(ByteArray, BYTE_ARRAY);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7880) [CI][R] R sanitizer job is not really working

2020-02-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7880:
--

 Summary: [CI][R] R sanitizer job is not really working
 Key: ARROW-7880
 URL: https://issues.apache.org/jira/browse/ARROW-7880
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


It's not failing, but it's not doing useful things. It's building the C++ 
library, then installing the R package, but it's not finding the C++ library 
that was built, and then the rest of the build is not erroring but not actually 
working, just burning electricity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7870) [CI][Packaging] Host nightly wheels on Apache bintray

2020-02-17 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7870:
--

 Summary: [CI][Packaging] Host nightly wheels on Apache bintray
 Key: ARROW-7870
 URL: https://issues.apache.org/jira/browse/ARROW-7870
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Neal Richardson
 Fix For: 1.0.0


See 
https://lists.apache.org/thread.html/r86c46849d8fe77de12821834b12330f0f77c3e7d7d4e6302c9f634d3%40%3Cdev.arrow.apache.org%3E

Investigate whether bintray is a good alternative, and if we use it, add a note 
to our website about nightly builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7865) [R] Test builds on latest Linux versions

2020-02-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7865:
--

 Summary: [R] Test builds on latest Linux versions
 Key: ARROW-7865
 URL: https://issues.apache.org/jira/browse/ARROW-7865
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See https://github.com/apache/arrow/issues/6435. CRAN might use old/stable 
versions but not everyone is so nostalgic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7864) [R] Make sure bundled installation works even if there are system packages

2020-02-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7864:
--

 Summary: [R] Make sure bundled installation works even if there 
are system packages
 Key: ARROW-7864
 URL: https://issues.apache.org/jira/browse/ARROW-7864
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 1.0.0


Among the issues:

* In https://github.com/apache/arrow/issues/6435: 0.15 system packages didn't 
have libarrow_dataset, so if they're installed and you try to install 0.16, 
pkg-config probably reports that the packages aren't available and it tries to 
build from source. That's fine except that in the linking step, apparently the 
system packages are being picked up instead of the static libs we just built, 
so installation fails (presumably until you either upgrade the system packages 
or delete them). In general, if we've decided to build/download static libs to 
match the R package, we should make sure those are the ones that get picked up.
* Whenever pkg-config does find packages, check the version and make sure it 
matches the R version, and if not, don't use them because they almost certainly 
won't work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7862) [R] Linux installation should run quieter by default

2020-02-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7862:
--

 Summary: [R] Linux installation should run quieter by default
 Key: ARROW-7862
 URL: https://issues.apache.org/jira/browse/ARROW-7862
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0, 0.16.1


No need to blow up the console by default. Also this solves an {{R CMD check}} 
warning that surfaced on CRAN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7860) [C++] Support cast to/from halffloat

2020-02-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7860:
--

 Summary: [C++] Support cast to/from halffloat
 Key: ARROW-7860
 URL: https://issues.apache.org/jira/browse/ARROW-7860
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Compute
Reporter: Neal Richardson
 Fix For: 1.0.0


In trying to do ARROW-7753 I realized I couldn't make a halffloat. I tried 
creating a float64 (as R does naturally) and casting to float16, but it's not 
implemented. Looking at compute/kernels/cast.cc, and the associated source in 
compute/kernels/generated/codegen.py, {{FLOATING_TYPES = ['Float', 'Double']}}. 
Maybe halffloat just needs to be added there? 

Aside: searching through the code, it seems that this limitation of float types 
to float32 and float64 is the norm. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7859) [R] Minor patches for CRAN submission 0.16.0.2

2020-02-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7859:
--

 Summary: [R] Minor patches for CRAN submission 0.16.0.2
 Key: ARROW-7859
 URL: https://issues.apache.org/jira/browse/ARROW-7859
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7853) [CI][Packaging] Add nightly test that pip-installs nightly wheels

2020-02-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7853:
--

 Summary: [CI][Packaging] Add nightly test that pip-installs 
nightly wheels
 Key: ARROW-7853
 URL: https://issues.apache.org/jira/browse/ARROW-7853
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration, Packaging, Python
Reporter: Neal Richardson
Assignee: Krisztian Szucs
 Fix For: 1.0.0


This would catch issues with wheels that we only encountered during release 
verification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7844) [R] Parquet list column test is flaky

2020-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7844:
--

 Summary: [R] Parquet list column test is flaky
 Key: ARROW-7844
 URL: https://issues.apache.org/jira/browse/ARROW-7844
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Francois Saint-Jacques


See [https://travis-ci.org/ursa-labs/arrow-r-nightly/jobs/649649349#L373-L375] 
for an example on public CI. I was seeing this locally this week but figured 
I'd screwed up my env somehow.

{code}
── 1. Failure: Lists are preserved when writing/reading from Parquet (@test-parq
  `object` not equivalent to `expected`.
  Component "num": Component 1: target is numeric, current is character
{code}

It's not always the same column in the data.frame that is affected. Also 
strange that it's only one column. You'd think that if it were transposing the 
order somehow, you'd get two that were swapped.

The test itself is straightforward 
(https://github.com/apache/arrow/blob/master/r/tests/testthat/test-parquet.R#L124-L137)
 so this is somewhat troubling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7833) [R] Make install_arrow() actually install arrow

2020-02-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7833:
--

 Summary: [R] Make install_arrow() actually install arrow
 Key: ARROW-7833
 URL: https://issues.apache.org/jira/browse/ARROW-7833
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7832) [R] Patches to 0.16.0 release

2020-02-11 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7832:
--

 Summary: [R] Patches to 0.16.0 release
 Key: ARROW-7832
 URL: https://issues.apache.org/jira/browse/ARROW-7832
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


CRAN did not like 0.16.0 as originally submitted. This contains the patches in 
the 0.16.0.1 resubmission.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >