[GitHub] [arrow] arw2019 commented on pull request #8782: ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in tests

2020-11-26 Thread GitBox
arw2019 commented on pull request #8782: URL: https://github.com/apache/arrow/pull/8782#issuecomment-734530401 I think the code changes here are ok. However, at least some of the build errors look related (and persisted across two CI runs) so that's left to figure out

[GitHub] [arrow] kou commented on a change in pull request #8756: ARROW-10541: [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-26 Thread GitBox
kou commented on a change in pull request #8756: URL: https://github.com/apache/arrow/pull/8756#discussion_r531281952 ## File path: cpp/cmake_modules/DefineOptions.cmake ## @@ -362,7 +362,9 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

[GitHub] [arrow] kou closed pull request #8756: ARROW-10541: [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-26 Thread GitBox
kou closed pull request #8756: URL: https://github.com/apache/arrow/pull/8756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] arw2019 closed pull request #8782: ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in tests

2020-11-26 Thread GitBox
arw2019 closed pull request #8782: URL: https://github.com/apache/arrow/pull/8782 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] liyafan82 closed pull request #8721: ARROW-10662: [Java] Avoid integer overflow for Json file reader

2020-11-26 Thread GitBox
liyafan82 closed pull request #8721: URL: https://github.com/apache/arrow/pull/8721 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8775: ARROW-10742: [Python] Check mask when creating array from numpy

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8775: URL: https://github.com/apache/arrow/pull/8775#issuecomment-734212791 https://issues.apache.org/jira/browse/ARROW-10742 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou commented on pull request #8770: ARROW-10696: [C++] Add SetBitRunReader

2020-11-26 Thread GitBox
pitrou commented on pull request #8770: URL: https://github.com/apache/arrow/pull/8770#issuecomment-734223439 Aggregation benchmarks: ``` benchmark baselinecontender change %

[GitHub] [arrow] xhochy closed pull request #8756: ARROW-10541: [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-26 Thread GitBox
xhochy closed pull request #8756: URL: https://github.com/apache/arrow/pull/8756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] chrisavl opened a new pull request #8775: ARROW-10742 [Python] Check mask when creating array from numpy

2020-11-26 Thread GitBox
chrisavl opened a new pull request #8775: URL: https://github.com/apache/arrow/pull/8775 This change adds checks so that the same exceptions are raised when creating an array with a mask using a python sequence or a numpy array.

[GitHub] [arrow] pitrou edited a comment on pull request #8770: ARROW-10696: [C++] Add SetBitRunReader

2020-11-26 Thread GitBox
pitrou edited a comment on pull request #8770: URL: https://github.com/apache/arrow/pull/8770#issuecomment-734223439 Aggregation benchmarks: ``` benchmark baselinecontender change %

[GitHub] [arrow] xhochy commented on a change in pull request #8756: ARROW-10541: [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-26 Thread GitBox
xhochy commented on a change in pull request #8756: URL: https://github.com/apache/arrow/pull/8756#discussion_r530938306 ## File path: cpp/cmake_modules/DefineOptions.cmake ## @@ -362,7 +362,9 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_CURRENT_SOURCE_DIR}")

[GitHub] [arrow] alamb commented on a change in pull request #8751: ARROW-10584: [Rust] [DataFusion] Add SQL support for JOIN ON syntax

2020-11-26 Thread GitBox
alamb commented on a change in pull request #8751: URL: https://github.com/apache/arrow/pull/8751#discussion_r530957868 ## File path: rust/datafusion/src/sql/planner.rs ## @@ -628,6 +701,53 @@ impl<'a, S: SchemaProvider> SqlToRel<'a, S> { } } +fn

[GitHub] [arrow] xhochy commented on pull request #8756: ARROW-10541: [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-26 Thread GitBox
xhochy commented on pull request #8756: URL: https://github.com/apache/arrow/pull/8756#issuecomment-734224862 @github-actions autotune This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] pitrou commented on a change in pull request #8621: ARROW-9128: [C++] Implement string space trimming kernels: trim, ltrim, and rtrim

2020-11-26 Thread GitBox
pitrou commented on a change in pull request #8621: URL: https://github.com/apache/arrow/pull/8621#discussion_r530997361 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -1231,6 +1251,302 @@ Result StrptimeResolve(KernelContext* ctx, const std::vector

[GitHub] [arrow] nevi-me closed pull request #8773: ARROW-10268: [Rust] Write out non-nested dictionaries in the IPC format

2020-11-26 Thread GitBox
nevi-me closed pull request #8773: URL: https://github.com/apache/arrow/pull/8773 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8776: ARROW-5679: [Python][CI] Remove Python 3.5 support

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8776: URL: https://github.com/apache/arrow/pull/8776#issuecomment-734325402 Revision: 4f279895dc900c5862732ecea3f8ce6526db2027 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] pitrou commented on pull request #8774: PARQUET-1566: [C++] Indicate if null count, distinct count are present in column statistics

2020-11-26 Thread GitBox
pitrou commented on pull request #8774: URL: https://github.com/apache/arrow/pull/8774#issuecomment-734304025 Note: CI failure is unrelated. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] pitrou commented on pull request #8776: ARROW-5679: [Python][CI] Remove Python 3.5 support

2020-11-26 Thread GitBox
pitrou commented on pull request #8776: URL: https://github.com/apache/arrow/pull/8776#issuecomment-734324232 @github-actions crossbow submit -g python This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou commented on a change in pull request #8755: ARROW-10709: [Python] Allow PythonFile.read() to always return a buffer

2020-11-26 Thread GitBox
pitrou commented on a change in pull request #8755: URL: https://github.com/apache/arrow/pull/8755#discussion_r530985982 ## File path: cpp/src/arrow/python/io.cc ## @@ -199,25 +219,32 @@ Result PyReadableFile::Read(int64_t nbytes, void* out) { PyObject* bytes_obj =

[GitHub] [arrow] nevi-me commented on a change in pull request #8731: [Rust] [RFC] Native Rust Arrow SQL IO

2020-11-26 Thread GitBox
nevi-me commented on a change in pull request #8731: URL: https://github.com/apache/arrow/pull/8731#discussion_r531034102 ## File path: rust/datafusion/examples/database_sql.rs ## @@ -0,0 +1,59 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8774: PARQUET-1566: [C++] Indicate if null count, distinct count are present in column statistics

2020-11-26 Thread GitBox
jorisvandenbossche commented on a change in pull request #8774: URL: https://github.com/apache/arrow/pull/8774#discussion_r531036980 ## File path: cpp/src/parquet/statistics.h ## @@ -206,16 +206,25 @@ class PARQUET_EXPORT Statistics { /// \param[in] null_count number of

[GitHub] [arrow] pitrou opened a new pull request #8776: ARROW-5679: [Python][CI] Remove Python 3.5 support

2020-11-26 Thread GitBox
pitrou opened a new pull request #8776: URL: https://github.com/apache/arrow/pull/8776 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] Dandandan commented on pull request #8771: ARROW-10740: [Rust][DataFusion] Remove redundant clones found by clippy

2020-11-26 Thread GitBox
Dandandan commented on pull request #8771: URL: https://github.com/apache/arrow/pull/8771#issuecomment-734317444 Thanks @nevi-me . I found the comment / previous issue, thanks for the link to the Jira story. Makes sense to coordinate / plan this a bit :+1:

[GitHub] [arrow] github-actions[bot] commented on pull request #8776: ARROW-5679: [Python][CI] Remove Python 3.5 support

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8776: URL: https://github.com/apache/arrow/pull/8776#issuecomment-734317354 https://issues.apache.org/jira/browse/ARROW-5679 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on pull request #8776: ARROW-5679: [Python][CI] Remove Python 3.5 support

2020-11-26 Thread GitBox
jorisvandenbossche commented on pull request #8776: URL: https://github.com/apache/arrow/pull/8776#issuecomment-734335703 Looking at that comment with the list of triggered builds, we should add builds with Python 3.9 to our CI. But that's another issue :)

[GitHub] [arrow] Dandandan opened a new pull request #8779: ARROW-10745: [Rust] Directly allocate padding bytes in filter context

2020-11-26 Thread GitBox
Dandandan opened a new pull request #8779: URL: https://github.com/apache/arrow/pull/8779 A thing I noticed while going through some code. When creating a`MutableBuffer` here only `filter_bytes.len()` are being allocated, but the capacity can afterwards being increased when adding

[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

2020-11-26 Thread GitBox
nevi-me commented on a change in pull request #8739: URL: https://github.com/apache/arrow/pull/8739#discussion_r531148917 ## File path: rust/arrow/src/array/equal/structure.rs ## @@ -30,19 +32,51 @@ fn equal_values( .iter() .zip(rhs.child_data())

[GitHub] [arrow] nevi-me commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

2020-11-26 Thread GitBox
nevi-me commented on a change in pull request #8739: URL: https://github.com/apache/arrow/pull/8739#discussion_r531155130 ## File path: rust/arrow/src/array/equal/structure.rs ## @@ -30,19 +32,51 @@ fn equal_values( .iter() .zip(rhs.child_data())

[GitHub] [arrow] pitrou commented on a change in pull request #8468: ARROW-10306: [C++] Add string replacement kernel

2020-11-26 Thread GitBox
pitrou commented on a change in pull request #8468: URL: https://github.com/apache/arrow/pull/8468#discussion_r531163812 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -1194,6 +1198,197 @@ void AddSplit(FunctionRegistry* registry) { #endif } +//

[GitHub] [arrow] github-actions[bot] commented on pull request #8779: ARROW-10745: [Rust] Directly allocate padding bytes in filter context

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8779: URL: https://github.com/apache/arrow/pull/8779#issuecomment-734404792 https://issues.apache.org/jira/browse/ARROW-10745 This is an automated message from the Apache Git

[GitHub] [arrow] alamb commented on a change in pull request #8739: ARROW-10684: [Rust] Inherit struct nulls in child null equality

2020-11-26 Thread GitBox
alamb commented on a change in pull request #8739: URL: https://github.com/apache/arrow/pull/8739#discussion_r529030845 ## File path: rust/arrow/src/array/equal/structure.rs ## @@ -30,19 +32,51 @@ fn equal_values( .iter() .zip(rhs.child_data())

[GitHub] [arrow] paddyhoran commented on pull request #8664: ARROW-10588: [Rust] Safe and parallel bit operations for Arrow

2020-11-26 Thread GitBox
paddyhoran commented on pull request #8664: URL: https://github.com/apache/arrow/pull/8664#issuecomment-734413544 Hi @vertexclique. All your contributions are very much appreciated. You are one of the most advanced contributors to the project meaning that it's important for the

[GitHub] [arrow] github-actions[bot] commented on pull request #8781: ARROW-10747: [Rust]: WIP CSV reader optimization

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8781: URL: https://github.com/apache/arrow/pull/8781#issuecomment-734481779 https://issues.apache.org/jira/browse/ARROW-10747 This is an automated message from the Apache Git

[GitHub] [arrow] Dandandan commented on pull request #8781: ARROW-10747: [Rust]: CSV reader optimization

2020-11-26 Thread GitBox
Dandandan commented on pull request #8781: URL: https://github.com/apache/arrow/pull/8781#issuecomment-734488236 I found some further opportunities for optimizing by also reusing the stringrecord items, for another speed up.

[GitHub] [arrow] Dandandan commented on pull request #8781: ARROW-10747: [Rust]: CSV reader optimization

2020-11-26 Thread GitBox
Dandandan commented on pull request #8781: URL: https://github.com/apache/arrow/pull/8781#issuecomment-734490178 Is ready for review now. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] arw2019 commented on a change in pull request #8774: PARQUET-1566: [C++] Indicate if null count, distinct count are present in column statistics

2020-11-26 Thread GitBox
arw2019 commented on a change in pull request #8774: URL: https://github.com/apache/arrow/pull/8774#discussion_r531128832 ## File path: cpp/src/parquet/statistics.h ## @@ -206,16 +206,25 @@ class PARQUET_EXPORT Statistics { /// \param[in] null_count number of null values

[GitHub] [arrow] github-actions[bot] commented on pull request #8778: WIP: ARROW-10224

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8778: URL: https://github.com/apache/arrow/pull/8778#issuecomment-734393581 Revision: 9dcade3ee0a4f0c6a712df9869665d103e167a8e Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] github-actions[bot] commented on pull request #8777: ARROW-10569: [C++] Improve table filtering performance

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8777: URL: https://github.com/apache/arrow/pull/8777#issuecomment-734382061 https://issues.apache.org/jira/browse/ARROW-10569 This is an automated message from the Apache Git

[GitHub] [arrow] xhochy opened a new pull request #8778: WIP: ARROW-10224

2020-11-26 Thread GitBox
xhochy opened a new pull request #8778: URL: https://github.com/apache/arrow/pull/8778 Debugging for #8386 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] pitrou opened a new pull request #8777: ARROW-10569: [C++] Improve table filtering performance

2020-11-26 Thread GitBox
pitrou opened a new pull request #8777: URL: https://github.com/apache/arrow/pull/8777 Instead of applying the same boolean filter to all N columns, first convert to filter to take indices. Selecting indices of a linear array is much faster, thanks to avoiding bit-unpacking on the

[GitHub] [arrow] xhochy commented on pull request #8778: WIP: ARROW-10224

2020-11-26 Thread GitBox
xhochy commented on pull request #8778: URL: https://github.com/apache/arrow/pull/8778#issuecomment-734392842 @github-actions crossbow submit wheel-win-cp39 This is an automated message from the Apache Git Service.

[GitHub] [arrow] Dandandan opened a new pull request #8781: ARROW-10747: [DataFusion]: WIP CSV reader optimization

2020-11-26 Thread GitBox
Dandandan opened a new pull request #8781: URL: https://github.com/apache/arrow/pull/8781 Still WIP, but makes CSV reading (quite a bit) faster by reusing a bit more allocations, doing things a bit more manually. The nytaxi benchmark speeds up from ~4500ms to ~3500ms. I think

[GitHub] [arrow] arw2019 opened a new pull request #8782: ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in parquet encoding tests

2020-11-26 Thread GitBox
arw2019 opened a new pull request #8782: URL: https://github.com/apache/arrow/pull/8782 As per a TODO left in ARROW-3769 / #3721 we can now use the `GTEST_SKIP` macro in `parquet/encoding-test.cpp`. `GTEST_SKIP` was added in gtest 1.10.0 so this involves bumping our minimal gtest version

[GitHub] [arrow] github-actions[bot] commented on pull request #8781: ARROW-10747: [DataFusion]: WIP CSV reader optimization

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8781: URL: https://github.com/apache/arrow/pull/8781#issuecomment-734472724 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] github-actions[bot] commented on pull request #8782: ARROW-10746: [C++] Bump gtest version + use GTEST_SKIP in parquet encoding tests

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8782: URL: https://github.com/apache/arrow/pull/8782#issuecomment-734472714 https://issues.apache.org/jira/browse/ARROW-10746 This is an automated message from the Apache Git

[GitHub] [arrow] velvia commented on a change in pull request #8688: ARROW-10330: [Rust][DataFusion] Implement NULLIF() SQL function

2020-11-26 Thread GitBox
velvia commented on a change in pull request #8688: URL: https://github.com/apache/arrow/pull/8688#discussion_r531184824 ## File path: rust/arrow/src/compute/kernels/boolean.rs ## @@ -149,6 +150,64 @@ pub fn is_not_null(input: ) -> Result {

[GitHub] [arrow] keeratsingh opened a new pull request #8780: [POC] ARROW-10671: [FlightRPC] Bearer Token refresh design with retry

2020-11-26 Thread GitBox
keeratsingh opened a new pull request #8780: URL: https://github.com/apache/arrow/pull/8780 - This is a POC for the proposed design [Link](https://docs.google.com/document/d/187DlGpIpOUPGhWvXVQEq0mXw_hdWjzzOuZp0p5qzBp0/edit?usp=sharing) - This POC only add the retry capability to

[GitHub] [arrow] github-actions[bot] commented on pull request #8780: [POC] ARROW-10671: [FlightRPC] Bearer Token refresh design with retry

2020-11-26 Thread GitBox
github-actions[bot] commented on pull request #8780: URL: https://github.com/apache/arrow/pull/8780#issuecomment-734457340 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then