[jira] [Updated] (ARROW-9127) [Rust] Update thirft library dependencies

2020-06-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9127:
--
Labels: pull-request-available  (was: )

> [Rust] Update thirft library dependencies
> -
>
> Key: ARROW-9127
> URL: https://issues.apache.org/jira/browse/ARROW-9127
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Andrew Lamb
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update to latest version of apache thrift (1.3)
>  
> Rationale:
> We were trying to update the version of `byteorder` that an internal project 
> used, but arrow/parquet -> depends on parquet-format-rs -> depends on thrift. 
>  
> [~sunchao] recently updated the thrift-pin in parquet-format in 
> [https://github.com/apache/arrow/pull/6626,]  so now it is possible to update 
> the thrift version here as well
>  
> The thrift dependency was postponed when the dependencies were last updated. 
> See:
> [https://github.com/apache/arrow/pull/6626]
> https://issues.apache.org/jira/browse/ARROW-8124



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9126) [C++] Trimmed Boost bundle fails to build on Windows

2020-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9126:
--
Labels: pull-request-available  (was: )

> [C++] Trimmed Boost bundle fails to build on Windows
> 
>
> Key: ARROW-9126
> URL: https://issues.apache.org/jira/browse/ARROW-9126
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Cuong Nguyen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Build with the following commands
> {code:java}
> mkdir build
> cd build
> cmake .. -DARROW_PARQUET=ON
> cmake --build .{code}
> Error from build log
> {code:java}
> .\boost/graph/two_bit_color_map.hpp(106): fatal error C1083: Cannot open 
> include file: 'boost/graph/detail/empty_header.hpp': No such file or directory
> {code}
> This was because configuring Boost to build a subset of libraries doesn't 
> work on Windows as it does on Linux. As a result, all libraries, including 
> those being trimmed, were built:
> {code:java}
> Component configuration:
>  - atomic : building
>  - chrono : building
>  - container : building
>  - date_time : building
>  - exception : building
>  - filesystem : building
>  - headers : building
>  - iostreams : building
>  - locale : building
>  - log : building
>  - mpi : building
>  - program_options : building
>  - python : building
>  - random : building
>  - regex : building
>  - serialization : building
>  - system : building
>  - test : building
>  - thread : building
>  - timer : building
>  - wave : building
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9125) [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind

2020-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9125:
--
Labels: pull-request-available  (was: )

> [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
> 
>
> Key: ARROW-9125
> URL: https://issues.apache.org/jira/browse/ARROW-9125
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9124) [Rust][Datafusion] DFParser should consume sql query as instead of String

2020-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9124:
--
Labels: pull-request-available  (was: )

> [Rust][Datafusion] DFParser should consume sql query as  instead of String
> --
>
> Key: ARROW-9124
> URL: https://issues.apache.org/jira/browse/ARROW-9124
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's more efficient to use  instead of String



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9123) [Python][wheel] Use libzstd.a explicitly

2020-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9123:
--
Labels: pull-request-available  (was: )

> [Python][wheel] Use libzstd.a explicitly
> 
>
> Key: ARROW-9123
> URL: https://issues.apache.org/jira/browse/ARROW-9123
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ARROW_ZSTD_USE_SHARED}} is introduced by ARROW-9084. We need to set 
> {{ARROW_ZSTD_USE_SHARED=OFF}} explicitly to use static zstd library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9116) [C++] Add BinaryArray::total_values_length()

2020-06-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9116:
--
Labels: pull-request-available  (was: )

> [C++] Add BinaryArray::total_values_length()
> 
>
> Key: ARROW-9116
> URL: https://issues.apache.org/jira/browse/ARROW-9116
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's often useful to compute the total data size of a binary array.
> Sample implementation:
> {code:c++}
>   int64_t total_values_length() const {
> return raw_value_offsets_[length() + data_->offset] - 
> raw_value_offsets_[data_->offset];
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9120) [C++] Lint and Format _internal headers

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9120:
--
Labels: pull-request-available  (was: )

> [C++] Lint and Format _internal headers
> ---
>
> Key: ARROW-9120
> URL: https://issues.apache.org/jira/browse/ARROW-9120
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.17.1
>Reporter: Ben Kietzman
>Assignee: Wes McKinney
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, headers named /*_internal.h/ are neither clang-formatted nor 
> cpplinted. Since they're not exported, CLI lint (forbid , nullptr, 
> ...) need not be applied



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8942) [R] Detect compression in reading CSV/JSON

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8942:
--
Labels: pull-request-available  (was: )

> [R] Detect compression in reading CSV/JSON
> --
>
> Key: ARROW-8942
> URL: https://issues.apache.org/jira/browse/ARROW-8942
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Dyfan Jones
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi all,
> Apologises if this has already been covered by another ticket. Is it possible 
> for arrow to read in compress delimited files (for example gzip)?
> Currently I get an error when trying to read in a compressed delimited file:
>  
> {code:java}
> vroom::vroom_write(iris, "iris.csv.gz", delim = ",")
> arrow::read_csv_arrow("iris.csv.gz")
> # Error in csv__TableReader_Read(self) :
> # Invalid: CSV parse error: Expected 1 columns, got 4{code}
> however it can be read in by vroom and readr:
> {code:java}
> vroom::vroom("iris.csv.gz")
> readr::read_csv("iris.csv.gz")
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9119) [C++] Add support for building with system static gRPC

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9119:
--
Labels: pull-request-available  (was: )

> [C++] Add support for building with system static gRPC
> --
>
> Key: ARROW-9119
> URL: https://issues.apache.org/jira/browse/ARROW-9119
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9030) [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9030:
--
Labels: pull-request-available  (was: )

> [Python] Clean up some usages of pyarrow.compat, move some common 
> functions/symbols to lib.pyx
> --
>
> Key: ARROW-9030
> URL: https://issues.apache.org/jira/browse/ARROW-9030
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I started doing this while looking into ARROW-4633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8510) [C++] arrow/dataset/file_base.cc fails to compile with internal compiler error with "Visual Studio 15 2017 Win64" generator

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8510:
--
Labels: pull-request-available  (was: )

> [C++] arrow/dataset/file_base.cc fails to compile with internal compiler 
> error with "Visual Studio 15 2017 Win64" generator
> ---
>
> Key: ARROW-8510
> URL: https://issues.apache.org/jira/browse/ARROW-8510
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I discovered this while running the release verification on Windows. There 
> was an obscuring issue which is that if the build fails, the verification 
> script continues. I will fix that



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9115) [C++] Process data buffers in batch in ascii_lower / ascii_upper kernels rather than using string_view value iteration

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9115:
--
Labels: pull-request-available  (was: )

> [C++] Process data buffers in batch in ascii_lower / ascii_upper kernels 
> rather than using string_view value iteration
> --
>
> Key: ARROW-9115
> URL: https://issues.apache.org/jira/browse/ARROW-9115
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Also add a benchmark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9079) [C++] Write benchmark for arithmetic kernels

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9079:
--
Labels: pull-request-available  (was: )

> [C++] Write benchmark for arithmetic kernels
> 
>
> Key: ARROW-9079
> URL: https://issues.apache.org/jira/browse/ARROW-9079
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The add kernel's implementation has changed in 
> https://github.com/apache/arrow/pull/7341, in order to ensure that no 
> performance regression was introduced write a benchmark for the kernels and 
> compare the results with the previous implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9113) Fix exception causes in cli.py

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9113:
--
Labels: pull-request-available  (was: )

> Fix exception causes in cli.py
> --
>
> Key: ARROW-9113
> URL: https://issues.apache.org/jira/browse/ARROW-9113
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Archery
>Reporter: Ram Rachum
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I recently went over 
> [Matplotlib](https://github.com/matplotlib/matplotlib/pull/16706), 
> [Pandas](https://github.com/pandas-dev/pandas/pull/32322) and 
> [NumPy](https://github.com/numpy/numpy/pull/15731), fixing a small mistake in 
> the way that Python 3's exception chaining is used. If you're interested, I 
> can do it here too. I've done it on just one file right now. 
> The mistake is this: In some parts of the code, an exception is being caught 
> and replaced with a more user-friendly error. In these cases the syntax 
> `raise new_error from old_error` needs to be used.
> Python 3's exception chaining means it shows not only the traceback of the 
> current exception, but that of the original exception (and possibly more.) 
> This is regardless of `raise from`. The usage of `raise from` tells Python to 
> put a more accurate message between the tracebacks. Instead of this: 
> During handling of the above exception, another exception occurred:
> You'll get this: 
> The above exception was the direct cause of the following exception:
> The first is inaccurate, because it signifies a bug in the exception-handling 
> code itself, which is a separate situation than wrapping an exception.
> Let me know what you think! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7028) [R] Date roundtrip results in different R storage mode

2020-06-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7028:
--
Labels: pull-request-available  (was: )

> [R] Date roundtrip results in different R storage mode
> --
>
> Key: ARROW-7028
> URL: https://issues.apache.org/jira/browse/ARROW-7028
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 0.15.0
>Reporter: Sascha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: image-2019-10-30-23-08-17-296.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When saving R-dataframes with parquet and loading them again, the internal 
> representation of Dates changes, leading e.g. to errors when comparing them 
> in dplyr::if_else.
> {code}
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #> intersect, setdiff, setequal, union
> tmp = tempdir()
> dat = tibble(tag = as.Date("2018-01-01"))
> dat2 = tibble(tag2 = as.Date("2019-01-01"))
> arrow::write_parquet(dat, file.path(tmp, "dat.parquet"))
> dat = arrow::read_parquet(file.path(tmp, "dat.parquet"))
> typeof(dat$tag)
> #> [1] "integer"
> typeof(dat2$tag2)
> #> [1] "double"
> bind_cols(dat, dat2) %>%
>  mutate(comparison = if_else(TRUE, tag, tag2))
> #> `false` must be a `Date` object, not a `Date` object
> {code}
> Created on 2019-10-30 by the [reprex package](https://reprex.tidyverse.org) 
> (v0.3.0)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6645) [Python] Dictionary indices are boundschecked unconditionally in CategoricalBlock.to_pandas

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6645:
--
Labels: pull-request-available  (was: )

> [Python] Dictionary indices are boundschecked unconditionally in 
> CategoricalBlock.to_pandas
> ---
>
> Key: ARROW-6645
> URL: https://issues.apache.org/jira/browse/ARROW-6645
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This was added at some point to fix a bug. I suspect we might want to move 
> this check somewhere else rather than do it every time {{to_pandas}} is called



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9112) [R] Update autobrew script location

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9112:
--
Labels: pull-request-available  (was: )

> [R] Update autobrew script location
> ---
>
> Key: ARROW-9112
> URL: https://issues.apache.org/jira/browse/ARROW-9112
> Project: Apache Arrow
>  Issue Type: Task
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Jeroen is moving it to a different location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8826) [Crossbow] remote URL should always have .git

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8826:
--
Labels: pull-request-available  (was: )

> [Crossbow] remote URL should always have .git
> -
>
> Key: ARROW-8826
> URL: https://issues.apache.org/jira/browse/ARROW-8826
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Developer Tools
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In ARROW-7803, I edited the crossbow templates for the homebrew jobs to 
> substitute in the correct fork of arrow and append the current git SHA so 
> that the code under test corresponds to the requested git commit. 
> Unfortunately, this caused the nightly builds to fail. 
> Comparing a successful on-demand run 
> (https://github.com/ursa-labs/crossbow/blob/actions-266-travis-homebrew-r-autobrew/.travis.yml)
>  with a nightly run 
> (https://github.com/ursa-labs/crossbow/blob/nightly-2020-05-16-0-travis-homebrew-cpp/.travis.yml),
>  it appears that the default "remote" URL that crossbow uses when not on a 
> fork/PR does not contain the ".git" suffix. And I suspect that Homebrew 
> requires that in order to identify the source as a git repo in order to clone 
> it correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-971) [C++/Python] Implement Array.isvalid/notnull/isnull as scalar functions

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-971:
-
Labels: dataframe pull-request-available  (was: dataframe)

> [C++/Python] Implement Array.isvalid/notnull/isnull as scalar functions
> ---
>
> Key: ARROW-971
> URL: https://issues.apache.org/jira/browse/ARROW-971
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: dataframe, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For arrays with nulls, this amounts to returning the validity bitmap. Without 
> nulls, an array of all 1 bits must be constructed. For isnull, the bits must 
> be flipped (in this case, the un-set part of the new bitmap must stay 0, 
> though).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8649) [Java] [Website] Java documentation on website is hidden

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8649:
--
Labels: pull-request-available  (was: )

> [Java] [Website] Java documentation on website is hidden
> 
>
> Key: ARROW-8649
> URL: https://issues.apache.org/jira/browse/ARROW-8649
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Website
>Reporter: Andy Grove
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is some excellent Java documentation on the web site that is hard to 
> find because the Java documentation link  [1] goes straight to the generated 
> javadocs.
>  
>  [1] https://arrow.apache.org/docs/java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9110) [C++] Fix CPU cache size detection on macOS

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9110:
--
Labels: pull-request-available  (was: )

> [C++] Fix CPU cache size detection on macOS
> ---
>
> Key: ARROW-9110
> URL: https://issues.apache.org/jira/browse/ARROW-9110
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Running certain benchmarks on macOS never ends because CpuInfo detects the 
> RAM size as  the size of L1 cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9101) [Doc][C++][Python] Document encoding expected by CSV and JSON readers

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9101:
--
Labels: pull-request-available  (was: )

> [Doc][C++][Python] Document encoding expected by CSV and JSON readers
> -
>
> Key: ARROW-9101
> URL: https://issues.apache.org/jira/browse/ARROW-9101
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9093) [FlightRPC][C++][Python] Allow setting gRPC client options

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9093:
--
Labels: pull-request-available  (was: )

> [FlightRPC][C++][Python] Allow setting gRPC client options
> --
>
> Key: ARROW-9093
> URL: https://issues.apache.org/jira/browse/ARROW-9093
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC, Python
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's no way to set generic gRPC options which are useful for tuning 
> behavior (e.g. round-robin load balancing). Rather than bind all of these one 
> by one, gRPC allows setting arguments as generic string-string or 
> string-integer pairs; we could expose this (and leave the interpretation 
> implementation-dependent).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-7676) [Packaging][Python] Ensure that the static libraries are not built in the wheel scripts

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-7676:
--
Labels: pull-request-available  (was: )

> [Packaging][Python] Ensure that the static libraries are not built in the 
> wheel scripts
> ---
>
> Key: ARROW-7676
> URL: https://issues.apache.org/jira/browse/ARROW-7676
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Even though we don't bundle them with the wheels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9102) [Packaging] Upload built manylinux docker images

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9102:
--
Labels: pull-request-available  (was: )

> [Packaging] Upload built manylinux docker images
> 
>
> Key: ARROW-9102
> URL: https://issues.apache.org/jira/browse/ARROW-9102
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> However the secrets were set on azure pipelines the upload step is failing: 
> https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=13104=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181
> So the manylinux builds take more than two hours. This is due to azure's 
> secret handling, we need to explicitly export the azure secret variables as 
> environment variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9100) Add ascii_lower kernel

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9100:
--
Labels: pull-request-available  (was: )

> Add ascii_lower kernel
> --
>
> Key: ARROW-9100
> URL: https://issues.apache.org/jira/browse/ARROW-9100
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Maarten Breddels
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9099) [C++][Gandiva] Add TRIM function for string

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9099:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Add TRIM function for string
> ---
>
> Key: ARROW-9099
> URL: https://issues.apache.org/jira/browse/ARROW-9099
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Sagnik Chakraborty
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9098) RecordBatch::ToStructArray cannot handle record batches with 0 column

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9098:
--
Labels: pull-request-available  (was: )

> RecordBatch::ToStructArray cannot handle record batches with 0 column
> -
>
> Key: ARROW-9098
> URL: https://issues.apache.org/jira/browse/ARROW-9098
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.17.1
>Reporter: Zhuo Peng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If RecordBatch::ToStructArray is called against a record batch with 0 column, 
> the following error will be raised:
> Invalid: Can't infer struct array length with 0 child arrays



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9088) [Rust] Recent version of arrow crate does not compile into wasm target

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9088:
--
Labels: pull-request-available  (was: )

> [Rust] Recent version of arrow crate does not compile into wasm target
> --
>
> Key: ARROW-9088
> URL: https://issues.apache.org/jira/browse/ARROW-9088
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Sergey Todyshev
>Assignee: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> arrow 0.16 compiles successfully into wasm32-unknown-unknown, but recent git 
> version does not. it would be nice to fix that.
> compiler errors:
>  
> {noformat}
> error[E0433]: failed to resolve: could not find `unix` in `os`
> --> 
> /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18
>  |
>   41 | use std::os::unix::ffi::OsStringExt;
>  |   could not find `unix` in `os`
>   
>   error[E0432]: unresolved import `unix`
>--> 
> /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5
> |
>   6 | use unix;
> |  no `unix` in the root{noformat}
> the problem is that prettytable-rs dependency depends on term->dirs which 
> causes this error
> consider making  prettytable-rs as dev dependency
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9095) [Rust] Fix NullArray to comply with spec

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9095:
--
Labels: pull-request-available  (was: )

> [Rust] Fix NullArray to comply with spec
> 
>
> Key: ARROW-9095
> URL: https://issues.apache.org/jira/browse/ARROW-9095
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I implemented the NullArray, I didn't comply with the spec under the 
> premise that I'd handle reading and writing IPC in a spec-compliant way as 
> that looked like the easier approach.
> After some integration testing, I realised that I wasn't doing it correctly, 
> so it's better to comply with the spec by not allocating any buffers for the 
> array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9090) [C++] Bump versions of bundled libraries

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9090:
--
Labels: pull-request-available  (was: )

> [C++] Bump versions of bundled libraries
> 
>
> Key: ARROW-9090
> URL: https://issues.apache.org/jira/browse/ARROW-9090
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should bump the versions of bundled dependencies, wherever possible, to 
> ensure that users get bugfixes and improvements made in those third-party 
> libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9092) [C++] gandiva-decimal-test hangs with LLVM 9

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9092:
--
Labels: pull-request-available  (was: )

> [C++] gandiva-decimal-test hangs with LLVM 9
> 
>
> Key: ARROW-9092
> URL: https://issues.apache.org/jira/browse/ARROW-9092
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I built Gandiva C++ unittests with LLVM 9 on Ubuntu 18.04 and 
> gandiva-decimal-test hangs forever



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9089) [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9089:
--
Labels: pull-request-available  (was: )

> [Python] A PyFileSystem handler for fsspec-based filesystems
> 
>
> Key: ARROW-9089
> URL: https://issues.apache.org/jira/browse/ARROW-9089
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow-up on ARROW-8766 to use this machinery to add an FSSpecHandler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8785) [Python][Packaging] Build the windows wheels with MIMALLOC enabled

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8785:
--
Labels: pull-request-available  (was: )

> [Python][Packaging] Build the windows wheels with MIMALLOC enabled
> --
>
> Key: ARROW-8785
> URL: https://issues.apache.org/jira/browse/ARROW-8785
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Alread set the flag, but there is a typo in it ARROW_MIMA"ll"OC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9087) Missing HDFS options parsing

2020-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9087:
--
Labels: pull-request-available  (was: )

> Missing HDFS options parsing
> 
>
> Key: ARROW-9087
> URL: https://issues.apache.org/jira/browse/ARROW-9087
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Yuan Zhou
>Assignee: Yuan Zhou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS options for kerberos ticket and extra conf is not parsed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9086) [CI][Homebrew] Enable Gandiva

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9086:
--
Labels: pull-request-available  (was: )

> [CI][Homebrew] Enable Gandiva
> -
>
> Key: ARROW-9086
> URL: https://issues.apache.org/jira/browse/ARROW-9086
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9085) [C++][CI] Appveyor CI test failures

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9085:
--
Labels: pull-request-available  (was: )

> [C++][CI] Appveyor CI test failures
> ---
>
> Key: ARROW-9085
> URL: https://issues.apache.org/jira/browse/ARROW-9085
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See 
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/33417919
> These seem to have been introduced by 
> https://github.com/apache/arrow/commit/b058cf0d1c26ad7984c104bb84322cc7dcc66f00



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9075) [C++] Optimize Filter implementation

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9075:
--
Labels: pull-request-available  (was: )

> [C++] Optimize Filter implementation
> 
>
> Key: ARROW-9075
> URL: https://issues.apache.org/jira/browse/ARROW-9075
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I split this off from ARROW-5760 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9084) [C++] cmake is unable to find zstd target when ZSTD_SOURCE=SYSTEM

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9084:
--
Labels: pull-request-available  (was: )

> [C++] cmake is unable to find zstd target when ZSTD_SOURCE=SYSTEM
> -
>
> Key: ARROW-9084
> URL: https://issues.apache.org/jira/browse/ARROW-9084
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.1
> Environment: zstd 1.4.5
>Reporter: Dmitry Kalinkin
>Assignee: Dmitry Kalinkin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A following problem occurs when arrow-cpp is built against system zstd:
> {noformat}
> CMake Error at cmake_modules/ThirdpartyToolchain.cmake:1860 
> (get_target_property):
>   get_target_property() called with non-existent target "ZSTD::zstd".
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5377:
--
Labels: pull-request-available  (was: )

> [C++] Develop interface for writing a RecordBatch IPC stream into 
> pre-allocated space (e.g. memory map) that avoids unnecessary serialization
> -
>
> Key: ARROW-5377
> URL: https://issues.apache.org/jira/browse/ARROW-5377
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in recent mailing list thread
> https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E
> The only viable process at the moment for getting an accurate report of 
> stream size is to write a simulated stream using {{MockOutputStream}}. This 
> is suboptimal for a couple of reasons:
> * Flatbuffers metadata must be created twice
> * Record batch disassembly into IpcPayload must be performed twice
> It seems like an interface with a very constrained public API could be 
> provided to deconstruct a sequence of RecordBatches and report the size of 
> the produced IPC stream (based on metadata sizes, and padding), and then this 
> deconstructed set of IPC payloads can be written out to a stream (e.g. using 
> {{FixedSizeBufferWriter}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9071) [C++] MakeArrayOfNull makes invalid ListArray

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9071:
--
Labels: pull-request-available  (was: )

> [C++] MakeArrayOfNull makes invalid ListArray
> -
>
> Key: ARROW-9071
> URL: https://issues.apache.org/jira/browse/ARROW-9071
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Zhuo Peng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce this bug is:
>  
> >>> a = pa.array([[1, 2]])
> >>> b = pa.array([None, None], type=pa.null())
> >>> t1 = pa.Table.from_arrays([a], ["a"])
> >>> t2 = pa.Table.from_arrays([b], ["b"])
>  
> >>> pa.concat_tables([t1, t2], promote=True)
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "pyarrow/table.pxi", line 2138, in pyarrow.lib.concat_tables
>  File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table
>  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 0: In chunk 1: Invalid: List child array 
> invalid: Invalid: Buffer #1 too small in array of type int64 and length 2: 
> expected at least 16 byte(s), got 12
> (because concat_tables(promote=True) will call MakeArrayOfNulls 
> ([https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647))|https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647)']
>  
> The code here seems incorrect:
> [https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/array/util.cc#L218]
> the length of the child array of a ListArray may not equal to the length of 
> the ListArray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8430) [CI] Configure self-hosted runners for Github Actions

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8430:
--
Labels: pull-request-available  (was: )

> [CI] Configure self-hosted runners for Github Actions
> -
>
> Key: ARROW-8430
> URL: https://issues.apache.org/jira/browse/ARROW-8430
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Set up Ubuntu C++ ARMv8 builders and perhaps AMD64 builder to run on 
> self-hosted github runners.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9082) [Rust] - Stream reader fail when steam not ended with (optional) 0xFFFFFFFF 0x00000000"

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9082:
--
Labels: pull-request-available  (was: )

> [Rust] - Stream reader fail when steam not ended with (optional) 0x 
> 0x" 
> 
>
> Key: ARROW-9082
> URL: https://issues.apache.org/jira/browse/ARROW-9082
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.17.1
>Reporter: Eyal Leshem
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> according to spec : 
> [https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format] , 
> the 0x 0x is optional in the arrow response stream , but 
> currently when client receive such response it's read all the batches well , 
> but return an error  in the end (instead of Ok(None)) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9074) [GLib] Add missing arrow-json check

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9074:
--
Labels: pull-request-available  (was: )

> [GLib] Add missing arrow-json check
> ---
>
> Key: ARROW-9074
> URL: https://issues.apache.org/jira/browse/ARROW-9074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5760) [C++] Optimize Take implementation

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5760:
--
Labels: pull-request-available  (was: )

> [C++] Optimize Take implementation
> --
>
> Key: ARROW-5760
> URL: https://issues.apache.org/jira/browse/ARROW-5760
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is some question of whether these kernels allocate optimally- for 
> example when Filtering or Taking strings it might be more efficient to pass 
> over the filter/indices twice, first to determine how much character storage 
> will be needed then again into allocated memory: 
> https://github.com/apache/arrow/pull/4531#discussion_r297160457
> Additionally, these kernels could probably make good use of scatter/gather 
> SIMD instructions.
> Furthermore, Filter's bitmap is currently lazily expanded into the indices of 
> elements to be appended to the output array. It would probably be more 
> efficient to expand to indices in batches, then gather using an index batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8726) [C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file name as value

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8726:
--
Labels: dataset pull-request-available  (was: dataset)

> [C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file 
> name as value
> --
>
> Key: ARROW-8726
> URL: https://issues.apache.org/jira/browse/ARROW-8726
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Jonathan Keane
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calling filter + collect on a dataset with a mis-specified partitioning 
> causes a segfault. Though this is clearly input error, it would be nice if 
> there was some guidance that something was wrong with the partitioning.
> {code:r}
> library(arrow)
> library(dplyr)
> dir.create("multi_mtcars/one", recursive = TRUE)
> dir.create("multi_mtcars/two", recursive = TRUE)
> write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
> write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
> ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
> # the following will segfault
> ds %>%
>   filter(cyl > 8) %>% 
>   collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9073) [C++] RapidJSON include directory detection doesn't work with RapidJSONConfig.cmake

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9073:
--
Labels: pull-request-available  (was: )

> [C++] RapidJSON include directory detection doesn't work with 
> RapidJSONConfig.cmake
> ---
>
> Key: ARROW-9073
> URL: https://issues.apache.org/jira/browse/ARROW-9073
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9077) [C++] Fix aggregate/scalar-compare benchmark null_percent calculation

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9077:
--
Labels: pull-request-available  (was: )

> [C++] Fix aggregate/scalar-compare benchmark null_percent calculation
> -
>
> Key: ARROW-9077
> URL: https://issues.apache.org/jira/browse/ARROW-9077
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Frank Du
>Assignee: Frank Du
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Wrong null percent in aggregate/scalar-compare as the changes in 
> benchmark_util.h. Correct both to use the new defined boilerplate.
> ./release/arrow-compute-aggregate-benchmark
>  
> --
>  Benchmark Time CPU Iterations UserCounters...
>  
> --
>  SumKernelFloat/32768/1 5.38 us 5.38 us 129832 
> bytes_per_second=5.67524G/s {color:#FF}null_percent=10k{color} 
> size=32.768k
>  SumKernelFloat/32768/1000 5.36 us 5.35 us 130069 bytes_per_second=5.6994G/s 
> null_percent=1000 size=32.768k
>  SumKernelFloat/32768/100 5.35 us 5.35 us 131071 bytes_per_second=5.70903G/s 
> null_percent=100 size=32.768k
>  SumKernelFloat/32768/50 10.8 us 10.7 us 65504 bytes_per_second=2.84073G/s 
> null_percent=50 size=32.768k
>  SumKernelFloat/32768/10 4.94 us 4.93 us 141624 bytes_per_second=6.18964G/s 
> null_percent=10 size=32.768k
>  SumKernelFloat/32768/1 4.41 us 4.40 us 158949 bytes_per_second=6.92913G/s 
> null_percent=1 size=32.768k



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8866) [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8866:
--
Labels: pull-request-available  (was: )

> [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION
> -
>
> Key: ARROW-8866
> URL: https://issues.apache.org/jira/browse/ARROW-8866
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ben Kietzman
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Similar to the recent {{Type::INTERVAL}} split, having these two array types 
> which have different memory layouts under the same {{Type::type}} value makes 
> function dispatch somewhat more complicated. This issue is less critical from 
> INTERVAL so this may not be urgent but seems like a good pre-1.0 change



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9062) [Rust] Support to read JSON into dictionary type

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9062:
--
Labels: pull-request-available  (was: )

> [Rust] Support to read JSON into dictionary type
> 
>
> Key: ARROW-9062
> URL: https://issues.apache.org/jira/browse/ARROW-9062
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust
>Reporter: Sven Wagner-Boysen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently a JSON reader build from a schema using the type dictionary for one 
> of the fields in the schema will fail with JsonError("struct types are not 
> yet supported")
> {code:java}
> let builder = ReaderBuilder::new().with_schema(..)
> let mut reader: Reader = 
> builder.build::(File::open(path).unwrap()).unwrap();
> let rb = reader.next().unwrap()
> {code}
>  
> Suggested solution:
> Support reading into a dictionary in Json Reader: 
> [https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L368]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9066) [Python] Raise correct error in isnull()

2020-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9066:
--
Labels: pull-request-available  (was: )

> [Python] Raise correct error in isnull()
> 
>
> Key: ARROW-9066
> URL: https://issues.apache.org/jira/browse/ARROW-9066
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9064) optimization debian package manager tweaks

2020-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9064:
--
Labels: pull-request-available  (was: )

> optimization debian package manager tweaks
> --
>
> Key: ARROW-9064
> URL: https://issues.apache.org/jira/browse/ARROW-9064
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Pratik Raj
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By default, Ubuntu or Debian based "apt" or "apt-get" system installs 
> recommended but not suggested packages .
> By passing "--no-install-recommends" option, the user lets apt-get know not 
> to consider recommended packages as a dependency to install.
> This results in smaller downloads and installation of packages .
> Refer to blog at [Ubuntu Blog] at 
> https://ubuntu.com/blog/we-reduced-our-docker-images-by-60-with-no-install-recommends



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8974) [C++] Refine TransferBitmap template parameters

2020-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8974:
--
Labels: pull-request-available  (was: )

> [C++] Refine TransferBitmap template parameters
> ---
>
> Key: ARROW-8974
> URL: https://issues.apache.org/jira/browse/ARROW-8974
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yibo Cai
>Assignee: Yibo Cai
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [TransferBitmap|https://github.com/apache/arrow/blob/44e723d9ac7c64739d419ad66618d2d56003d1b7/cpp/src/arrow/util/bit_util.cc#L110]
>  has two template parameters of bool type with four combinations.
> Change them to function parameters can reduce code size. I think 
> "restore_trailing_bits" cannot impact performance. "invert_bits" needs 
> benchmark.
> Also, bool parameter is hard to figure out at [caller 
> side|https://github.com/apache/arrow/blob/44e723d9ac7c64739d419ad66618d2d56003d1b7/cpp/src/arrow/util/bit_util.cc#L208],
>  better to use meaningful defines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9061) [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9061:
--
Labels: pull-request-available  (was: )

> [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib
> --
>
> Key: ARROW-9061
> URL: https://issues.apache.org/jira/browse/ARROW-9061
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib, Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8736) [Rust] [DataFusion] Table API should provide a schema() method

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8736:
--
Labels: pull-request-available  (was: )

> [Rust] [DataFusion] Table API should provide a schema() method
> --
>
> Key: ARROW-8736
> URL: https://issues.apache.org/jira/browse/ARROW-8736
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Table API should provide a schema() method. It is currently not possible to 
> examine the schema of a registered table without getting it via the logical 
> schema but that isn't intuitive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9060) [GLib] Add support for building Apache Arrow Datasets GLib with non-installed Apache Arrow Datasets

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9060:
--
Labels: pull-request-available  (was: )

> [GLib] Add support for building Apache Arrow Datasets GLib with non-installed 
> Apache Arrow Datasets
> ---
>
> Key: ARROW-9060
> URL: https://issues.apache.org/jira/browse/ARROW-9060
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It's required for packaging: 
> https://travis-ci.org/github/ursa-labs/crossbow/builds/695595159
> {noformat}
>   CXX  libarrow_dataset_glib_la-scanner.lo
> scanner.cpp:24:33: fatal error: arrow/util/iterator.h: No such file or 
> directory
>  #include 
>  ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9059) [Rust] Documentation for slicing array data has the wrong sign

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9059:
--
Labels: pull-request-available  (was: )

> [Rust] Documentation for slicing array data has the wrong sign
> --
>
> Key: ARROW-9059
> URL: https://issues.apache.org/jira/browse/ARROW-9059
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Bobby Wagner
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the slice_data function in array.rs, the docstring says it panics if 
> offset+length is less than data.len(), the code actually panics if offset + 
> length is greater than data.len()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9058) [Packaging][wheel] Boost download is failed

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9058:
--
Labels: pull-request-available  (was: )

> [Packaging][wheel] Boost download is failed
> ---
>
> Key: ARROW-9058
> URL: https://issues.apache.org/jira/browse/ARROW-9058
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=12893=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=d9b15392-e4ce-5e4c-0c8c-b69645229181
> {noformat}
> + curl -sL 
> https://dl.bintray.com/boostorg/release/1.68.0/source/boost_1_68_0.tar.gz -o 
> /boost_1_68_0.tar.gz
> + tar xf boost_1_68_0.tar.gz
> tar: This does not look like a tar archive
> tar: Error exit delayed from previous errors
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9057) Projection should work on InMemoryScan without error

2020-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9057:
--
Labels: pull-request-available  (was: )

> Projection should work on InMemoryScan without error
> 
>
> Key: ARROW-9057
> URL: https://issues.apache.org/jira/browse/ARROW-9057
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8781) [CI][C++] Enable ccache on GHA MinGW jobs

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8781:
--
Labels: pull-request-available  (was: )

> [CI][C++] Enable ccache on GHA MinGW jobs
> -
>
> Key: ARROW-8781
> URL: https://issues.apache.org/jira/browse/ARROW-8781
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be nice to enable caching with ccache on the MinGW Github Actions 
> jobs. They're currently quite slow...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9050) [Release] Use 1.0.0 as the next version

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9050:
--
Labels: pull-request-available  (was: )

> [Release] Use 1.0.0 as the next version
> ---
>
> Key: ARROW-9050
> URL: https://issues.apache.org/jira/browse/ARROW-9050
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9052) [CI][MinGW] Enable Gandiva

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9052:
--
Labels: pull-request-available  (was: )

> [CI][MinGW] Enable Gandiva
> --
>
> Key: ARROW-9052
> URL: https://issues.apache.org/jira/browse/ARROW-9052
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva, Continuous Integration, GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9047:
--
Labels: pull-request-available  (was: )

> [Rust] Setting 0-bits of a 0-length bitset segfaults
> 
>
> Key: ARROW-9047
> URL: https://issues.apache.org/jira/browse/ARROW-9047
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Max Burke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See PR for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9007) [Rust] Support appending arrays by merging array data

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9007:
--
Labels: pull-request-available  (was: )

> [Rust] Support appending arrays by merging array data
> -
>
> Key: ARROW-9007
> URL: https://issues.apache.org/jira/browse/ARROW-9007
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Neville Dipale
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-9005 introduces a concat kernel which allows for concatenating multiple 
> arrays of the same type into a single array. This is useful for sorting on 
> multiple arrays, among other things.
> The concat kernel is implemented for most array types, but not yet for nested 
> arrays (lists, structs, etc).
> This Jira is for creating a way of appending/merging all array types, so that 
> concat (and functionality that depends on it) can support all array types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6917) [Developer] Implement Python script to generate git cherry-pick commands needed to create patch build branch for maint releases

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6917:
--
Labels: pull-request-available  (was: )

> [Developer] Implement Python script to generate git cherry-pick commands 
> needed to create patch build branch for maint releases
> ---
>
> Key: ARROW-6917
> URL: https://issues.apache.org/jira/browse/ARROW-6917
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For 0.14.1, I maintained this script by hand. It would be less failure-prone 
> (maybe) to generate it based on the fix versions set in JIRA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9051) [GLib] Refer Array related objects from Array

2020-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9051:
--
Labels: pull-request-available  (was: )

> [GLib] Refer Array related objects from Array
> -
>
> Key: ARROW-9051
> URL: https://issues.apache.org/jira/browse/ARROW-9051
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9046) [C++][R] Put more things in type_fwds

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9046:
--
Labels: pull-request-available  (was: )

> [C++][R] Put more things in type_fwds
> -
>
> Key: ARROW-9046
> URL: https://issues.apache.org/jira/browse/ARROW-9046
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, R
>Reporter: Neal Richardson
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hopefully to reduce compile time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8723) [Rust] Remove SIMD specific benchmark code

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8723:
--
Labels: pull-request-available  (was: )

> [Rust] Remove SIMD specific benchmark code
> --
>
> Key: ARROW-8723
> URL: https://issues.apache.org/jira/browse/ARROW-8723
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now that SIMD is behind a feature flag it's trivial to compare SIMD vs 
> non-SIMD and the SIMD versions of benchmarks can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9045) [C++] Improve and expand Take/Filter benchmarks

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9045:
--
Labels: pull-request-available  (was: )

> [C++] Improve and expand Take/Filter benchmarks
> ---
>
> Key: ARROW-9045
> URL: https://issues.apache.org/jira/browse/ARROW-9045
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm putting this up as a separate patch for review



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-555) [C++] String algorithm library for StringArray/BinaryArray

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-555:
-
Labels: Analytics pull-request-available  (was: Analytics)

> [C++] String algorithm library for StringArray/BinaryArray
> --
>
> Key: ARROW-555
> URL: https://issues.apache.org/jira/browse/ARROW-555
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: Analytics, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a parent JIRA for starting a module for processing strings in-memory 
> arranged in Arrow format. This will include using the re2 C++ regular 
> expression library and other standard string manipulations (such as those 
> found on Python's string objects)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9034) [C++] Implement binary (two bitmap) version of BitBlockCounter

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9034:
--
Labels: pull-request-available  (was: )

> [C++] Implement binary (two bitmap) version of BitBlockCounter
> --
>
> Key: ARROW-9034
> URL: https://issues.apache.org/jira/browse/ARROW-9034
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current BitBlockCounter from ARROW-9029 is useful for unary operations. 
> Some operations involve multiple bitmaps and so it's useful to be able to 
> determine the block popcounts of the AND of the respective words in the 
> bitmaps. So each returned block would contain the number of bits that are set 
> in both bitmaps at the same locations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9043) [Go] Temporarily copy LICENSE.txt to go/

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9043:
--
Labels: pull-request-available  (was: )

> [Go] Temporarily copy LICENSE.txt to go/
> 
>
> Key: ARROW-9043
> URL: https://issues.apache.org/jira/browse/ARROW-9043
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{go mod}} needs to find a license file in the root of the Go module. In the 
> future "go mod" may be able to follow symlinks in which case this can be 
> replaced by a symlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9042) [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior

2020-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9042:
--
Labels: pull-request-available  (was: )

> [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior
> 
>
> Key: ARROW-9042
> URL: https://issues.apache.org/jira/browse/ARROW-9042
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Also avoid undefined behaviour caused by signed integer overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8909) [Java] Out of order writes using setSafe

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8909:
--
Labels: pull-request-available  (was: )

> [Java] Out of order writes using setSafe
> 
>
> Key: ARROW-8909
> URL: https://issues.apache.org/jira/browse/ARROW-8909
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Saurabh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I noticed that calling setSafe on a VarCharVector with indices not in 
> increasing order causes the lastIndex to be set to the index in the last call 
> to setSafe.
> Is this a documented and expected behavior ?
> Sample code:
> {code:java}
> import java.util.Collections;
> import lombok.extern.slf4j.Slf4j;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> import org.apache.arrow.vector.util.Text;
> @Slf4j
> public class ATest {
>   public static void main() {
> Schema schema = new 
> Schema(Collections.singletonList(Field.nullable("Data", new 
> ArrowType.Utf8(;
> try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
> RootAllocator())) {
>   VarCharVector vec = (VarCharVector) vroot.getVector("Data");
>   for (int i = 0; i < 10; i++) {
> vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
>   }
>   vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
>   log.info("Data at index 8 Before {}", vec.getObject(8));
>   vroot.setRowCount(10);
>   log.info("Data at index 8 After {}", vec.getObject(8));
>   log.info(vroot.contentToTSVString());
> }
>   }
> }
> {code}
>  
> If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, 
> ..., 9_mtest entries.
> If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new,
>     Before the setRowCount, the data at index 8 is -> *st8_mtest*  ; index 9 
> is *9_mtest*
>    After the setRowCount, the data at index 8 is -> "" ; index  9 is ""
> With a text with more chars instead of 4 with _new, it keeps eating into the 
> data at the following indices.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9037:
--
Labels: pull-request-available  (was: )

> [C++/C-ABI] unable to import array with null count == -1 (which could be 
> exported)
> --
>
> Key: ARROW-9037
> URL: https://issues.apache.org/jira/browse/ARROW-9037
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.17.1
>Reporter: Zhuo Peng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If an Array is created with null_count == -1 but without any null (and thus 
> no null bitmap buffer), then ArrayData.null_count will remain -1 when 
> exporting if null_count is never computed. The exported C struct also has 
> null_count == -1 [1]. But when importing, if null_count != 0, an error [2] 
> will be raised.
> [1] 
> https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560
> [2] 
> https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9032) [C++] Split arrow/util/bit_util.h into multiple header files

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9032:
--
Labels: pull-request-available  (was: )

> [C++] Split arrow/util/bit_util.h into multiple header files
> 
>
> Key: ARROW-9032
> URL: https://issues.apache.org/jira/browse/ARROW-9032
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This header has grown quite large and any given compilation unit's use of it 
> is likely limited to only a couple of functions or classes. I suspect it 
> would improve compilation time to split up this header into a few headers 
> organized by frequency of code use. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6602) [Doc] Add feature / implementation matrix

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6602:
--
Labels: pull-request-available  (was: )

> [Doc] Add feature / implementation matrix
> -
>
> Key: ARROW-6602
> URL: https://issues.apache.org/jira/browse/ARROW-6602
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have many different implementations and each implementation makes a 
> different set of features available. It would be nice to have a top-level doc 
> page making it clear which implementation supports what.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8766) [Python] A FileSystem implementation based on Python callbacks

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8766:
--
Labels: dataset-dask-integration filesystem pull-request-available  (was: 
dataset-dask-integration filesystem)

> [Python] A FileSystem implementation based on Python callbacks
> --
>
> Key: ARROW-8766
> URL: https://issues.apache.org/jira/browse/ARROW-8766
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Python
>Reporter: Joris Van den Bossche
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: dataset-dask-integration, filesystem, 
> pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new {{pyarrow.fs}} filesystems are now actual C++ objects, and no longer 
> "just" a python interface. So they can't easily be expanded from the Python 
> side, and the existing integration with {{fsspec}} filesystems is therefore 
> also not working anymore. 
> One possible solution is  to have a C++ filesystem that calls back into a 
> python object for each of its methods (possibly similar to how you can 
> implement a flight server in Python, I suppose). 
> Such a FileSystem implementation would allow to make a {{pyarrow.fs}} wrapper 
> for {{fsspec}} filesystems, and thus allow such filesystems to be used in 
> pyarrow where new filesystems are expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3154) [Python][C++] Document how to write _metadata, _common_metadata files with Parquet datasets

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3154:
--
Labels: dataset parquet pull-request-available  (was: dataset parquet)

> [Python][C++] Document how to write _metadata, _common_metadata files with 
> Parquet datasets
> ---
>
> Key: ARROW-3154
> URL: https://issues.apache.org/jira/browse/ARROW-3154
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: dataset, parquet, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is not mentioned in great detail in 
> http://arrow.apache.org/docs/python/parquet.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9029) [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9029:
--
Labels: pull-request-available  (was: )

> [C++] Implement BitmapScanner interface to accelerate processing of 
> mostly-not-null data
> 
>
> Key: ARROW-9029
> URL: https://issues.apache.org/jira/browse/ARROW-9029
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In analytics, it is common for data to be all not-null or mostly not-null. 
> Data with > 50% nulls tends to be more exceptional. In this light, our 
> {{BitmapReader}} class which allows iteration of each bit in a bitmap can be 
> computationally suboptimal for mostly set validity bitmaps.
> I propose instead a new interface for use in kernel implementations, for lack 
> of a better term {{BitmapScanner}}. This works as follows:
> * Uses popcount to accumulate consecutive 64-bit words from a bitmap where 
> all values are set, up to some limit (e.g. anywhere from 8 to 128 words or 
> more -- we can use benchmarks to determine what is a good limit). The length 
> of this "all-on" run is returned to the caller in a single function call, so 
> that this "run" of data can be processed without any bit-by-bit bitmap 
> checking
> * If words containing unset bits is encountered, the scanner will similarly 
> accumulate non-full words until the next full word is encountered or a limit 
> is hit. The length of this "has nulls" run is returned to the caller, which 
> then proceeds bit-by-bit to process the data
> For data with a lot of nulls, this may degrade performance somewhat but 
> probably not that much empirically. However, data that is mostly-not-null 
> should benefit from this. 
> This BitmapScanner utility can probably also be used to accelerate the 
> implementation of Filter for mostly-not-null data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8946) [Python] Add tests for parquet.write_metadata metadata_collector

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8946:
--
Labels: pull-request-available  (was: )

> [Python] Add tests for parquet.write_metadata metadata_collector
> 
>
> Key: ARROW-8946
> URL: https://issues.apache.org/jira/browse/ARROW-8946
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Follow-up on ARROW-8062: the PR added functionality to 
> {{parquet.write_metadata}} to pass a a collection of metadata objects to be 
> concatenated. We should add some specific tests for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9026:
--
Labels: pull-request-available  (was: )

> [C++/Python] Force package removal from arrow-nightlies conda repository
> 
>
> Key: ARROW-9026
> URL: https://issues.apache.org/jira/browse/ARROW-9026
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9024:
--
Labels: pull-request-available  (was: )

> [C++/Python] Install anaconda-client in conda-clean job
> ---
>
> Key: ARROW-9024
> URL: https://issues.apache.org/jira/browse/ARROW-9024
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9023) [C++] Use mimalloc conda package

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9023:
--
Labels: pull-request-available  (was: )

> [C++] Use mimalloc conda package
> 
>
> Key: ARROW-9023
> URL: https://issues.apache.org/jira/browse/ARROW-9023
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe Korn
>Assignee: Uwe Korn
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9022) [C++][Compute] Make Add function safe for numeric limits

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9022:
--
Labels: pull-request-available  (was: )

> [C++][Compute] Make Add function safe for numeric limits
> 
>
> Key: ARROW-9022
> URL: https://issues.apache.org/jira/browse/ARROW-9022
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the output type of the Add function is identical with the argument 
> types which makes it unsafe to add numeric limit values, so instead of using 
> {{(int8, int8) -> int8}} signature we should use {{((int8, int8) -> int16}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9021) [Python] The filesystem keyword in parquet.read_table is not documented

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9021:
--
Labels: pull-request-available  (was: )

> [Python] The filesystem keyword in parquet.read_table is not documented
> ---
>
> Key: ARROW-9021
> URL: https://issues.apache.org/jira/browse/ARROW-9021
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8979) [C++] Implement bitmap word reader and writer

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8979:
--
Labels: pull-request-available  (was: )

> [C++] Implement bitmap word reader and writer
> -
>
> Key: ARROW-8979
> URL: https://issues.apache.org/jira/browse/ARROW-8979
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yibo Cai
>Assignee: Yibo Cai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Below three Jira tasks optimize bitmap operations(logical, copy, compare, 
> etc) unaligned case. They use word-by-word approach instead of bit-by-bit to 
> improve performance.
>  There are some common code of read/write bitmap in words. It's better to 
> implement word based bitmap reader and writer to wrap similar function and 
> reduce code redundancy.
>  https://issues.apache.org/jira/browse/ARROW-8553
>  https://issues.apache.org/jira/browse/ARROW-8843
>  https://issues.apache.org/jira/browse/ARROW-8844



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4633:
--
Labels: dataset-parquet-read newbie parquet pull-request-available  (was: 
dataset-parquet-read newbie parquet)

> [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway
> --
>
> Key: ARROW-4633
> URL: https://issues.apache.org/jira/browse/ARROW-4633
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.12.0
> Environment: Linux, Python 3.7.1, pyarrow.__version__ = 0.12.0
>Reporter: Taylor Johnson
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: dataset-parquet-read, newbie, parquet, 
> pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following code seems to suggest that ParquetFile.read(use_threads=False) 
> still creates a ThreadPool.  This is observed in 
> ParquetFile.read_row_group(use_threads=False) as well. 
> This does not appear to be a problem in 
> pyarrow.Table.to_pandas(use_threads=False).
> I've tried tracing the error.  Starting in python/pyarrow/parquet.py, both 
> ParquetReader.read_all() and ParquetReader.read_row_group() pass the 
> use_threads input along to self.reader which is a ParquetReader imported from 
> _parquet.pyx
> Following the calls into python/pyarrow/_parquet.pyx, we see that 
> ParquetReader.read_all() and ParquetReader.read_row_group() have the 
> following code which seems a bit suspicious
> {quote}if use_threads:
>     self.set_use_threads(use_threads)
> {quote}
> Why not just always call self.set_use_threads(use_threads)?
> The ParquetReader.set_use_threads simply calls 
> self.reader.get().set_use_threads(use_threads).  This self.reader is assigned 
> as unique_ptr[FileReader].  I think this points to 
> cpp/src/parquet/arrow/reader.cc, but I'm not sure about that.  The 
> FileReader::Impl::ReadRowGroup logic looks ok, as a call to 
> ::arrow::internal::GetCpuThreadPool() is only called if use_threads is True.  
> The same is true for ReadTable.
> So when is the ThreadPool getting created?
> Example code:
> --
> {quote}import pandas as pd
> import psutil
> import pyarrow as pa
> import pyarrow.parquet as pq
> use_threads=False
> p=psutil.Process()
> print('Starting with {} threads'.format(p.num_threads()))
> df = pd.DataFrame(\{'x':[0]})
> table = pa.Table.from_pandas(df)
> print('After table creation, {} threads'.format(p.num_threads()))
> df = table.to_pandas(use_threads=use_threads)
> print('table.to_pandas(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> writer = pq.ParquetWriter('tmp.parquet', table.schema)
> writer.write_table(table)
> writer.close()
> print('After writing parquet file, {} threads'.format(p.num_threads()))
> pf = pq.ParquetFile('tmp.parquet')
> print('After ParquetFile, {} threads'.format(p.num_threads()))
> df = pf.read(use_threads=use_threads).to_pandas()
> print('After pf.read(use_threads={}), {} threads'.format(use_threads, 
> p.num_threads()))
> {quote}
> ---
> $ python pyarrow_test.py
> Starting with 1 threads
> After table creation, 1 threads
> table.to_pandas(use_threads=False), 1 threads
> After writing parquet file, 1 threads
> After ParquetFile, 1 threads
> After pf.read(use_threads=False), 5 threads



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2702) [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc to see if we are using the right error type in each instance

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2702:
--
Labels: pull-request-available  (was: )

> [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc 
> to see if we are using the right error type in each instance
> -
>
> Key: ARROW-2702
> URL: https://issues.apache.org/jira/browse/ARROW-2702
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See discussion in [https://github.com/apache/arrow/pull/2075]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9018) [C++] Remove APIs that were deprecated in 0.17.x and prior

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9018:
--
Labels: pull-request-available  (was: )

> [C++] Remove APIs that were deprecated in 0.17.x and prior
> --
>
> Key: ARROW-9018
> URL: https://issues.apache.org/jira/browse/ARROW-9018
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8904) [Python] Fix usages of deprecated C++ APIs related to child/field

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8904:
--
Labels: pull-request-available  (was: )

> [Python] Fix usages of deprecated C++ APIs related to child/field
> -
>
> Key: ARROW-8904
> URL: https://issues.apache.org/jira/browse/ARROW-8904
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> -- Running cmake --build for pyarrow
> cmake --build . --config debug -- -j16
> [19/20] Building CXX object CMakeFiles/lib.dir/lib.cpp.o
> lib.cpp:20265:85: warning: 'num_children' is deprecated: Use num_fields() 
> [-Wdeprecated-declarations]
>   __pyx_t_1 = __pyx_f_7pyarrow_3lib__normalize_index(__pyx_v_i, 
> __pyx_v_self->type->num_children()); if (unlikely(__pyx_t_1 == 
> ((Py_ssize_t)-1L))) __PYX_ERR(1, 119, __pyx_L1_error)
>   
>   ^
> /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use num_fields()")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:20276:76: warning: 'child' is deprecated: Use field(i) 
> [-Wdeprecated-declarations]
>   __pyx_t_2 = 
> __pyx_f_7pyarrow_3lib_pyarrow_wrap_field(__pyx_v_self->type->child(__pyx_v_index));
>  if (unlikely(!__pyx_t_2)) __PYX_ERR(1, 120, __pyx_L1_error)
>^
> /home/wesm/local/include/arrow/type.h:251:3: note: 'child' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use field(i)")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:20507:56: warning: 'num_children' is deprecated: Use num_fields() 
> [-Wdeprecated-declarations]
>   __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_self->type->num_children()); if 
> (unlikely(!__pyx_t_1)) __PYX_ERR(1, 139, __pyx_L1_error)
>^
> /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use num_fields()")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:23361:44: warning: 'num_children' is deprecated: Use num_fields() 
> [-Wdeprecated-declarations]
>   __pyx_r = __pyx_v_self->__pyx_base.type->num_children();
>^
> /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use num_fields()")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:24039:44: warning: 'num_children' is deprecated: Use num_fields() 
> [-Wdeprecated-declarations]
>   __pyx_r = __pyx_v_self->__pyx_base.type->num_children();
>^
> /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use num_fields()")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:58220:37: warning: 'child' is deprecated: Use field(pos) 
> [-Wdeprecated-declarations]
>   __pyx_v_child = __pyx_v_self->ap->child(__pyx_v_child_id);
> ^
> /home/wesm/local/include/arrow/array.h:1281:3: note: 'child' has been 
> explicitly marked deprecated here
>   ARROW_DEPRECATED("Use field(pos)")
>   ^
> /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from 
> macro 'ARROW_DEPRECATED'
> #  define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
>^
> lib.cpp:58956:74: warning: 'children' is 

[jira] [Updated] (ARROW-8951) [C++] Fix compiler warning in compute/kernels/scalar_cast_temporal.cc

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8951:
--
Labels: pull-request-available  (was: )

> [C++] Fix compiler warning in compute/kernels/scalar_cast_temporal.cc
> -
>
> Key: ARROW-8951
> URL: https://issues.apache.org/jira/browse/ARROW-8951
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The kernel functor can return an uninitialized value on errors
> {code}
> ../src/arrow/compute/kernels/scalar_cast_temporal.cc: In member function ‘OUT 
> arrow::compute::internal::ParseTimestamp::Call(arrow::compute::KernelContext*,
>  ARG0) const [with OUT = long int; ARG0 = 
> nonstd::sv_lite::basic_string_view]’:
> ../src/arrow/compute/kernels/scalar_cast_temporal.cc:267:12: warning: 
> ‘result’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>  return result;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9016) [Java] Remove direct references to Netty/Unsafe Allocators

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9016:
--
Labels: pull-request-available  (was: )

> [Java] Remove direct references to Netty/Unsafe Allocators
> --
>
> Key: ARROW-9016
> URL: https://issues.apache.org/jira/browse/ARROW-9016
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of ARROW-8230 this removes direct references to Netty and Unsafe 
> Allocation managers in the `DefaultAllocationManagerOption`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9015) [Java] Make BaseAllocator package private

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9015:
--
Labels: pull-request-available  (was: )

> [Java] Make BaseAllocator package private
> -
>
> Key: ARROW-9015
> URL: https://issues.apache.org/jira/browse/ARROW-9015
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Ryan Murray
>Assignee: Ryan Murray
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As part of the netty work in ARROW-8230 it became clear that BaseAllocator 
> should be package private



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9014) [Packaging] Bump the minor part of the automatically generated version in crossbow

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9014:
--
Labels: pull-request-available  (was: )

> [Packaging] Bump the minor part of the automatically generated version in 
> crossbow
> --
>
> Key: ARROW-9014
> URL: https://issues.apache.org/jira/browse/ARROW-9014
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Crossbow uses setuptools_scm to generate a development version number using 
> git describe command. This means that it finds the latest {{reachable}} tag 
> from the current commit on master. 
> The minor releases are created from the master branch whereas the patch 
> release tags point to commits on maintenance branches (like 0.17.x) which 
> means that if we already have released a patch version, like 0.17.1 then 
> crossbow generates a version number like 
> 0.17.0.dev{number-of-commits-from-0.17.0} and bumps its patch tag, eventually 
> creating binary packages with version 0.17.1.dev123.
> The main problem with this is that the produced nightly python wheels are not 
> picked up by pip, because on pypi we already have that patch release 
> available and pip doesn't consider 0.17.1.dev123 newer than 0.17.1 (with 
> --pre option passed). 
> So to force pip to install the newer nightly packages we need to bump the 
> minor version instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9010) [Java] Framework and interface changes for RecordBatch IPC buffer compression

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9010:
--
Labels: pull-request-available  (was: )

> [Java] Framework and interface changes for RecordBatch IPC buffer compression
> -
>
> Key: ARROW-9010
> URL: https://issues.apache.org/jira/browse/ARROW-9010
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is the first sub-work item of ARROW-8672 (
> [Java] Implement RecordBatch IPC buffer compression from ARROW-300). However, 
> it does not involve any concrete compression algorithms. The purpose of this 
> PR is to establish basic interfaces for data compression, and make changes to 
> the IPC framework so that different compression algorithms can be plug-in 
> smoothly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9005) Support sort expression

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9005:
--
Labels: pull-request-available  (was: )

> Support sort expression
> ---
>
> Key: ARROW-9005
> URL: https://issues.apache.org/jira/browse/ARROW-9005
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9004) [C++][Gandiva] Upgrade to LLVM 10

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-9004:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Upgrade to LLVM 10
> -
>
> Key: ARROW-9004
> URL: https://issues.apache.org/jira/browse/ARROW-9004
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8929) [C++] Change compute::Arity:VarArgs min_args default to 0

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8929:
--
Labels: pull-request-available  (was: )

> [C++] Change compute::Arity:VarArgs min_args default to 0
> -
>
> Key: ARROW-8929
> URL: https://issues.apache.org/jira/browse/ARROW-8929
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The issue of minimum number of arguments is separate from providing an 
> {{InputType}} for input type checking. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8985) [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers type for forward compatibility

2020-06-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8985:
--
Labels: pull-request-available  (was: )

> [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers 
> type for forward compatibility
> 
>
> Key: ARROW-8985
> URL: https://issues.apache.org/jira/browse/ARROW-8985
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This will permit larger or smaller decimals to be added to the format later 
> without having to add a new Type union value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >