[GitHub] [arrow] nickpoorman commented on pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox
nickpoorman commented on pull request #7670: URL: https://github.com/apache/arrow/pull/7670#issuecomment-655192799 @stuartcarnie @sbinet This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] sagnikc-dremio commented on pull request #7641: ARROW-9328: [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string

2020-07-07 Thread GitBox
sagnikc-dremio commented on pull request #7641: URL: https://github.com/apache/arrow/pull/7641#issuecomment-655228953 @pprudhvi @projjal Can you please review this change? This is an automated message from the Apache Git

[GitHub] [arrow] kou opened a new pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
kou opened a new pull request #7669: URL: https://github.com/apache/arrow/pull/7669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7669: URL: https://github.com/apache/arrow/pull/7669#issuecomment-655153095 https://issues.apache.org/jira/browse/ARROW-9351 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7670: URL: https://github.com/apache/arrow/pull/7670#issuecomment-655153093 https://issues.apache.org/jira/browse/ARROW-9365 This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
liyafan82 commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655232313 > Looks fine for the most part, but I'm not really sure why we need to separate `arrow-memory-core` and `arrow-memory-unsafe`? Couldn't those be combined since it wouldn't add

[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox
houqp commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451273265 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader. +///

[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox
houqp commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451278084 ## File path: rust/arrow/src/record_batch.rs ## @@ -216,15 +216,28 @@ impl Into for RecordBatch { } } -/// Definition of record batch reader. +///

[GitHub] [arrow] github-actions[bot] commented on pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7671: URL: https://github.com/apache/arrow/pull/7671#issuecomment-655212330 https://issues.apache.org/jira/browse/ARROW-8344 This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451235584 ## File path: java/memory/memory-core/pom.xml ## @@ -0,0 +1,65 @@ + + +http://maven.apache.org/POM/4.0.0; +

[GitHub] [arrow] BryanCutler commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
BryanCutler commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655246147 On a related note, it seems like our netty version 4.1.27 is pretty old now, ~2 years, do you all think it would be good to upgrade this before the 1.0.0 release? It looks

[GitHub] [arrow] github-actions[bot] commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7669: URL: https://github.com/apache/arrow/pull/7669#issuecomment-655148194 Revision: 3d5cc704cda625df93aa045e17860fc5d5ea62d5 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
kou commented on pull request #7669: URL: https://github.com/apache/arrow/pull/7669#issuecomment-655147282 @github-actions crossbow submit test-ubuntu-18.04-cpp-cmake32 This is an automated message from the Apache Git

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
mr-smidge commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r451216615 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -237,7 +329,9 @@ public ReadOnlySpan GetBytes(int index) if

[GitHub] [arrow] jacques-n commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
jacques-n commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655210526 > Looks fine for the most part, but I'm not really sure why we need to separate `arrow-memory-core` and `arrow-memory-unsafe`? Couldn't those be combined since it wouldn't add

[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
BryanCutler commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r451102944 ## File path: java/adapter/orc/pom.xml ## @@ -15,10 +15,16 @@ org.apache.arrow -arrow-memory -

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox
c-jamie commented on a change in pull request #7635: URL: https://github.com/apache/arrow/pull/7635#discussion_r451159046 ## File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc ## @@ -0,0 +1,223 @@ + +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox
c-jamie commented on a change in pull request #7635: URL: https://github.com/apache/arrow/pull/7635#discussion_r451159113 ## File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc ## @@ -0,0 +1,223 @@ + +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] nickpoorman opened a new pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox
nickpoorman opened a new pull request #7670: URL: https://github.com/apache/arrow/pull/7670 This PR adds the rest of the implemented typed array builders to the NewBuilder function. I ran into needing this because `NewStructBuilder` internally calls `NewBuilder`.

[GitHub] [arrow] kou commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
kou commented on pull request #7669: URL: https://github.com/apache/arrow/pull/7669#issuecomment-655204658 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] kou closed pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox
kou closed pull request #7669: URL: https://github.com/apache/arrow/pull/7669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] BryanCutler commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox
BryanCutler commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-655244616 I agree that the recommended allocator should still be the netty one for now, so I guess it wouldn't be good to bundle the unsafe allocator as a possible default. I'm good

[GitHub] [arrow] kou commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-07 Thread GitBox
kou commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r451154810 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +47,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" Review comment: Could you

[GitHub] [arrow] lidavidm commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-07 Thread GitBox
lidavidm commented on pull request #7664: URL: https://github.com/apache/arrow/pull/7664#issuecomment-655193980 Just a high level comment: if I'm reading this right, V4 is still the default metadata version and applications opt in to V5 when they want to read/write unions. Am I

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
mr-smidge commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r451215414 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
mr-smidge commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r451215646 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -173,11 +245,19 @@ public TBuilder Set(int index, byte value) throw

[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
stevengj edited a comment on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087 > It seems utf8proc (incorrectly?) claims some undefined codepoints (e.g. https://www.compart.com/en/unicode/U+08BE) are UTF8PROC_CATEGORY_LO (General category Letter

[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox
houqp commented on a change in pull request #7666: URL: https://github.com/apache/arrow/pull/7666#discussion_r451273911 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -20,13 +20,13 @@ use std::sync::{Arc, Mutex}; use arrow::datatypes::Schema; +use

[GitHub] [arrow] praveenbingo closed pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva

2020-07-07 Thread GitBox
praveenbingo closed pull request #7642: URL: https://github.com/apache/arrow/pull/7642 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm commented on issue #7663: Sorting on pyarrow data structures ?

2020-07-07 Thread GitBox
wesm commented on issue #7663: URL: https://github.com/apache/arrow/issues/7663#issuecomment-655197308 This isn't where we handle feature requests. There are some sorting-related issues in JIRA; if you do not find one that describes the APIs are you are looking for, could you open a new

[GitHub] [arrow] wesm closed issue #7663: Sorting on pyarrow data structures ?

2020-07-07 Thread GitBox
wesm closed issue #7663: URL: https://github.com/apache/arrow/issues/7663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
mr-smidge commented on a change in pull request #7671: URL: https://github.com/apache/arrow/pull/7671#discussion_r451214556 ## File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs ## @@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)

[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
stevengj edited a comment on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087 [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was defined in Unicode 13, and category Lo is correct. It sounds like you may be looking at

[GitHub] [arrow] stevengj commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
stevengj commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087 [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was defined in Unicode 13, and category Lo is correct.

[GitHub] [arrow] mr-smidge opened a new pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox
mr-smidge opened a new pull request #7671: URL: https://github.com/apache/arrow/pull/7671 This PR fixes a few bugs in `BinaryArray.Builder()`: * Fixes the `Clear()` method, which previously would break all subsequently-appended values (see JIRA ticket for examples). * Makes the

[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
stevengj edited a comment on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087 > It seems utf8proc (incorrectly?) claims some undefined codepoints (e.g. https://www.compart.com/en/unicode/U+08BE) are UTF8PROC_CATEGORY_LO (General category Letter

[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
stevengj edited a comment on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087 [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was defined in Unicode 13, and category Lo is correct. It sounds like you may be looking at

[GitHub] [arrow] arw2019 opened a new pull request #7672: ARROW-9348: [C++] Replace usages of TestBase::MakeRandomArray in testing/gtest_util.h with RandomArrayGenerator

2020-07-07 Thread GitBox
arw2019 opened a new pull request #7672: URL: https://github.com/apache/arrow/pull/7672 This PR addresses https://issues.apache.org/jira/browse/ARROW-9348 This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7672: ARROW-9348: [C++] Replace usages of TestBase::MakeRandomArray in testing/gtest_util.h with RandomArrayGenerator

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7672: URL: https://github.com/apache/arrow/pull/7672#issuecomment-655295896 https://issues.apache.org/jira/browse/ARROW-9348 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on a change in pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
pitrou commented on a change in pull request #7648: URL: https://github.com/apache/arrow/pull/7648#discussion_r450734963 ## File path: r/R/python.R ## @@ -73,6 +73,44 @@ r_to_py.RecordBatch <- function(x, convert = FALSE) { out } +r_to_py.ChunkedArray <- function(x,

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #7545: URL: https://github.com/apache/arrow/pull/7545#discussion_r450744941 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,31 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem, use_mmap=False):

[GitHub] [arrow] jorisvandenbossche closed pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-07 Thread GitBox
jorisvandenbossche closed pull request #7631: URL: https://github.com/apache/arrow/pull/7631 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] romainfrancois commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-07 Thread GitBox
romainfrancois commented on pull request #7645: URL: https://github.com/apache/arrow/pull/7645#issuecomment-654732149 In other words, when creating a chunked array from a list of factors from R, should the dictionary be unified and shared across the arrays of the chunked array ?

[GitHub] [arrow] github-actions[bot] commented on pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7654: URL: https://github.com/apache/arrow/pull/7654#issuecomment-654734988 https://issues.apache.org/jira/browse/ARROW-8581 This is an automated message from the Apache Git

[GitHub] [arrow] romainfrancois commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-07 Thread GitBox
romainfrancois commented on pull request #7645: URL: https://github.com/apache/arrow/pull/7645#issuecomment-654675294 Any reason why `ChunkedArray$print()` does not use the `ToString()` C++ method ? @nealrichardson ``` r library(arrow, warn.conflicts = FALSE) f1 <-

[GitHub] [arrow] jorisvandenbossche commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-07 Thread GitBox
jorisvandenbossche commented on pull request #7623: URL: https://github.com/apache/arrow/pull/7623#issuecomment-654695256 OK, I now also enabled the commented-out tests This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-07 Thread GitBox
jorisvandenbossche commented on a change in pull request #7608: URL: https://github.com/apache/arrow/pull/7608#discussion_r450721280 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory { }

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-07-07 Thread GitBox
jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-654704786 > I think that any comparison involving the dict type should also work with the "effective" logical type (the value type of the dict). Opened

[GitHub] [arrow] jorisvandenbossche commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-07 Thread GitBox
jorisvandenbossche commented on pull request #7623: URL: https://github.com/apache/arrow/pull/7623#issuecomment-654709090 > What happens if a certain row group column has only nulls? Is the min/max then also null, or is it not defined? (might be good to add a test case for this) It

[GitHub] [arrow] jorisvandenbossche commented on pull request #7546: ARROW-8733: [C++][Dataset][Python] Expose RowGroupInfo statistics values

2020-07-07 Thread GitBox
jorisvandenbossche commented on pull request #7546: URL: https://github.com/apache/arrow/pull/7546#issuecomment-654718123 @rjzamora I opened https://issues.apache.org/jira/browse/ARROW-9346 to track the `total_byte_size` suggestion

[GitHub] [arrow] mr-smidge opened a new pull request #7654: ARROW-8581: [C#] Accept and return DateTime from DateXXArray

2020-07-07 Thread GitBox
mr-smidge opened a new pull request #7654: URL: https://github.com/apache/arrow/pull/7654 This PR introduces a _breaking change_ to the public API for `Date32Array` and `Date64Array` by changing the accepted and returned data type from `Sysetm.DateTimeOffset` to `System.DateTime`.

[GitHub] [arrow] pitrou commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
pitrou commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654729844 The main source of potential inefficiency here is that the Schema is exported/imported once for each chunk. We may or may not case immediately about this. Also, note that

[GitHub] [arrow] pitrou commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
pitrou commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654730028 (otherwise, the code here looks ok, but I'm not a R expert at all :-)) This is an automated message from the

[GitHub] [arrow] romainfrancois commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-07 Thread GitBox
romainfrancois commented on pull request #7645: URL: https://github.com/apache/arrow/pull/7645#issuecomment-654744787 I think we can leave this for a follow up: ``` // R factor levels must be type "character" so coerce `dict` to STRSXP // TODO (npr): this coercion

[GitHub] [arrow] kszucs closed pull request #7640: ARROW-9327: [Rust] Fix all clippy errors for arrow crate

2020-07-07 Thread GitBox
kszucs closed pull request #7640: URL: https://github.com/apache/arrow/pull/7640 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou closed pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-07 Thread GitBox
pitrou closed pull request #7632: URL: https://github.com/apache/arrow/pull/7632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kszucs commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec

2020-07-07 Thread GitBox
kszucs commented on pull request #7176: URL: https://github.com/apache/arrow/pull/7176#issuecomment-654800827 Note that 1.0 release is about the format stability rather than API stability. Given the number of daily downloads from crates.io I don't think we should really worry about

[GitHub] [arrow] kszucs commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r450777633 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +47,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" Review comment: > $

[GitHub] [arrow] kszucs commented on pull request #7650: ARROW-9340: [R] Use CRAN version of decor package

2020-07-07 Thread GitBox
kszucs commented on pull request #7650: URL: https://github.com/apache/arrow/pull/7650#issuecomment-654783337 @nealrichardson the RTools 35 build has a failure because no binary is available for decor. This is an automated

[GitHub] [arrow] kszucs commented on pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
kszucs commented on pull request #7655: URL: https://github.com/apache/arrow/pull/7655#issuecomment-654797194 Thanks for taking it over. It looks good to me except handling the multiple slashes on the python side. This is

[GitHub] [arrow] rok commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse tensor that manages its indices in separated vectors

2020-07-07 Thread GitBox
rok commented on pull request #7044: URL: https://github.com/apache/arrow/pull/7044#issuecomment-654824921 > Not only scipy, but also [SuiteSparse](https://github.com/DrTimothyAldenDavis/SuiteSparse) employs the split format. I didn't realize, Then we should indeed have proper

[GitHub] [arrow] github-actions[bot] commented on pull request #7657: ARROW-8886: [C#] Resize to negative length no longer permitted

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7657: URL: https://github.com/apache/arrow/pull/7657#issuecomment-654832532 https://issues.apache.org/jira/browse/ARROW-8886 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r450783001 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +47,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" Review comment: > FYI: We

[GitHub] [arrow] github-actions[bot] commented on pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7655: URL: https://github.com/apache/arrow/pull/7655#issuecomment-654791871 https://issues.apache.org/jira/browse/ARROW-9121 This is an automated message from the Apache Git

[GitHub] [arrow] maartenbreddels opened a new pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
maartenbreddels opened a new pull request #7656: URL: https://github.com/apache/arrow/pull/7656 Quite a few issues showed up: * utf8proc doesn't store and expose the information if a codepoint is of a Numeric type, thus we cannot implement isdigit/isnumeric (and also isalnum)

[GitHub] [arrow] martindurant commented on a change in pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-07 Thread GitBox
martindurant commented on a change in pull request #7545: URL: https://github.com/apache/arrow/pull/7545#discussion_r450835383 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,31 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem, use_mmap=False): +

[GitHub] [arrow] kszucs commented on a change in pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7655: URL: https://github.com/apache/arrow/pull/7655#discussion_r450799483 ## File path: python/pyarrow/_fs.pyx ## @@ -484,10 +484,17 @@ cdef class FileSystem: -- path : str The path of the

[GitHub] [arrow] pitrou opened a new pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
pitrou opened a new pull request #7655: URL: https://github.com/apache/arrow/pull/7655 Add a separate method DeleteRootDirContents, in case the operation of wiping the root directory is really desired. This is an automated

[GitHub] [arrow] kszucs commented on a change in pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7655: URL: https://github.com/apache/arrow/pull/7655#discussion_r450799961 ## File path: python/pyarrow/fs.py ## @@ -146,13 +146,22 @@ def create_dir(self, path, recursive): def delete_dir(self, path):

[GitHub] [arrow] pitrou commented on a change in pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
pitrou commented on a change in pull request #7655: URL: https://github.com/apache/arrow/pull/7655#discussion_r450799978 ## File path: python/pyarrow/_fs.pyx ## @@ -484,10 +484,17 @@ cdef class FileSystem: -- path : str The path of the

[GitHub] [arrow] mr-smidge opened a new pull request #7657: ARROW-8886: [C#] Resize to negative length no longer permitted

2020-07-07 Thread GitBox
mr-smidge opened a new pull request #7657: URL: https://github.com/apache/arrow/pull/7657 The C# implementation of buffer builders previously accepted a negative length to the `Resize()` method, and clamped it to zero. However, based on [comments by the original author of that

[GitHub] [arrow] kszucs commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r450786728 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +47,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" Review comment: @kou it

[GitHub] [arrow] kszucs commented on a change in pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
kszucs commented on a change in pull request #7655: URL: https://github.com/apache/arrow/pull/7655#discussion_r450798098 ## File path: cpp/src/arrow/filesystem/s3fs.cc ## @@ -1433,6 +1433,10 @@ Status S3FileSystem::DeleteDirContents(const std::string& s) { return

[GitHub] [arrow] github-actions[bot] commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-654809137 https://issues.apache.org/jira/browse/ARROW-9268 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7659: URL: https://github.com/apache/arrow/pull/7659#issuecomment-654907022 https://issues.apache.org/jira/browse/ARROW-9287 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs closed pull request #7655: ARROW-9121: [C++] Forbid empty or root path in FileSystem::DeleteDirContents

2020-07-07 Thread GitBox
kszucs closed pull request #7655: URL: https://github.com/apache/arrow/pull/7655 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] nealrichardson commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
nealrichardson commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654929934 > Also, note that you can transfer a Table as a sequence of RecordBatches, rather than a sequence of ChunkedArrays. But Tables *are* a sequence of ChunkedArrays,

[GitHub] [arrow] kszucs opened a new pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox
kszucs opened a new pull request #7658: URL: https://github.com/apache/arrow/pull/7658 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kszucs commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox
kszucs commented on pull request #7658: URL: https://github.com/apache/arrow/pull/7658#issuecomment-654866696 @github-actions crossbow submit wheel-win-* This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] praveenbingo commented on a change in pull request #7653: ARROW-9343: [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces

2020-07-07 Thread GitBox
praveenbingo commented on a change in pull request #7653: URL: https://github.com/apache/arrow/pull/7653#discussion_r450928004 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -672,4 +674,27 @@ const char* replace_utf8_utf8_utf8(gdv_int64 context, const char*

[GitHub] [arrow] nealrichardson commented on a change in pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
nealrichardson commented on a change in pull request #7648: URL: https://github.com/apache/arrow/pull/7648#discussion_r450932956 ## File path: r/R/python.R ## @@ -73,6 +73,44 @@ r_to_py.RecordBatch <- function(x, convert = FALSE) { out } +r_to_py.ChunkedArray <-

[GitHub] [arrow] wesm opened a new pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-07 Thread GitBox
wesm opened a new pull request #7659: URL: https://github.com/apache/arrow/pull/7659 Summary of places where changes were needed: * Add to integration tests. uint64 does not work in JavaScript so this is disabled temporarily (NEEDS TICKET) * Support in

[GitHub] [arrow] kszucs commented on pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-07 Thread GitBox
kszucs commented on pull request #7659: URL: https://github.com/apache/arrow/pull/7659#issuecomment-654918991 Please add the unsigned types to the [DictionaryScalar](https://github.com/apache/arrow/blob/master/cpp/src/arrow/scalar.cc#L203) as well.

[GitHub] [arrow] github-actions[bot] commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7658: URL: https://github.com/apache/arrow/pull/7658#issuecomment-654867825 Revision: 2cffe20659d0e3a01e9f393e36a0c88e1d063594 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7658: URL: https://github.com/apache/arrow/pull/7658#issuecomment-654873095 https://issues.apache.org/jira/browse/ARROW-9305 This is an automated message from the Apache Git

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox
c-jamie commented on a change in pull request #7635: URL: https://github.com/apache/arrow/pull/7635#discussion_r450986751 ## File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc ## @@ -0,0 +1,223 @@ + +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
maartenbreddels commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-654991684 I wanted to focus on the other issues first, but I propose a change in naming by this: `__`, since all you care about is that it's a function that operates on a

[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
wesm commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-654998304 I think that having e.g. `string_lower_utf8` and `string_lower_ascii` is fine, too. This is an automated message

[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
maartenbreddels commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655001506 If it is not off the table, I propose at least utf8->unicode renaming, to be future compatible. This is

[GitHub] [arrow] wesm commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox
wesm commented on pull request #7658: URL: https://github.com/apache/arrow/pull/7658#issuecomment-654938006 Should this be made BUNDLED? This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] romainfrancois commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-07 Thread GitBox
romainfrancois commented on pull request #7660: URL: https://github.com/apache/arrow/pull/7660#issuecomment-654949268 I think we need a function for the `raws <- vctrs::new_list_of(...)` part, and probably an equivalent function for the other binary types, but `binary()`, `large_binary()`

[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
wesm commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-654949067 I haven't looked at the details yet, but could we use a normalized naming convention for the functions? * ascii_$FUNC, for functions that expect ASCII input * utf8_$FUNC,

[GitHub] [arrow] pitrou commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
pitrou commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654954241 It should :-) This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] github-actions[bot] commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-07 Thread GitBox
github-actions[bot] commented on pull request #7660: URL: https://github.com/apache/arrow/pull/7660#issuecomment-654953783 https://issues.apache.org/jira/browse/ARROW-9291 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-07 Thread GitBox
wesm commented on pull request #7659: URL: https://github.com/apache/arrow/pull/7659#issuecomment-654964007 @kszucs done This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox
c-jamie commented on a change in pull request #7635: URL: https://github.com/apache/arrow/pull/7635#discussion_r450986449 ## File path: cpp/src/arrow/compute/api_scalar.cc ## @@ -126,5 +126,24 @@ Result Compare(const Datum& left, const Datum& right, CompareOptions opti

[GitHub] [arrow] jorisvandenbossche commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-07 Thread GitBox
jorisvandenbossche commented on pull request #7623: URL: https://github.com/apache/arrow/pull/7623#issuecomment-654981357 I think @bkietz is still taking a look today? This is an automated message from the Apache Git

[GitHub] [arrow] xhochy commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
xhochy commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-654996646 I'm indifferent to `utf8_` vs `string_` (sometimes our codebase is too) but other than that I fully agree to @wesm naming scheme.

[GitHub] [arrow] xhochy commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox
xhochy commented on pull request #7656: URL: https://github.com/apache/arrow/pull/7656#issuecomment-655000427 > Are there plans to support utf16/32? Seeing the code as it is now, it would be trivial to add. This would be a longer discussion. Personally I would vote "no" to limit the

[GitHub] [arrow] pitrou commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox
pitrou commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654954793 However, it may use the offset member of arrays, which might not work with the C Data Interface... This is an

  1   2   >