[jira] [Created] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits
Francois Saint-Jacques created ARROW-4268: - Summary: [C++] Add C primitive to Arrow:Type compile time in TypeTraits Key: ARROW-4268 URL: https://issues.apache.org/jira/browse/ARROW-4268 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques The user would use something like ``` ... using ArrowType = CTypeTraits::ArrowType; using ArrayType = CTypeTraits::ArrayType; auto type = CTypeTraits::type_singleton(); ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3998) Support TPC-H dbgen in Arrow
Francois Saint-Jacques created ARROW-3998: - Summary: Support TPC-H dbgen in Arrow Key: ARROW-3998 URL: https://issues.apache.org/jira/browse/ARROW-3998 Project: Apache Arrow Issue Type: Wish Reporter: Francois Saint-Jacques Integration tests and benchmarks should read TPC-H data. This is going to be useful for future query execution engine benchmarking. It could also attract researchers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3990) [Python] developer documentation is missing double-conversion dep
Francois Saint-Jacques created ARROW-3990: - Summary: [Python] developer documentation is missing double-conversion dep Key: ARROW-3990 URL: https://issues.apache.org/jira/browse/ARROW-3990 Project: Apache Arrow Issue Type: Bug Components: Documentation Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4084) Simplify Status and stringstream boilerplate
Francois Saint-Jacques created ARROW-4084: - Summary: Simplify Status and stringstream boilerplate Key: ARROW-4084 URL: https://issues.apache.org/jira/browse/ARROW-4084 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques There's a lot of stringstream repetition when creating a Status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4102) [C++] FixedSizeBinary identity cast not implemented
Francois Saint-Jacques created ARROW-4102: - Summary: [C++] FixedSizeBinary identity cast not implemented Key: ARROW-4102 URL: https://issues.apache.org/jira/browse/ARROW-4102 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3862) Improve dependencies download script
Francois Saint-Jacques created ARROW-3862: - Summary: Improve dependencies download script Key: ARROW-3862 URL: https://issues.apache.org/jira/browse/ARROW-3862 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4407) [CMake] ExternalProject_Add does not capture CC/CXX correctly
Francois Saint-Jacques created ARROW-4407: - Summary: [CMake] ExternalProject_Add does not capture CC/CXX correctly Key: ARROW-4407 URL: https://issues.apache.org/jira/browse/ARROW-4407 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.12.0 Reporter: Francois Saint-Jacques The issue is that CC/CXX environment variables are captured on the first invocation of the builder (e.g make or ninja) instead of when CMake is invoked into to build directory. This can lead to compilation errors (notably when compiling with clang in the top directory due to the addition of the `-Qunused-arguments` option). This leads to an issue where I have a script that prepare the build directory and export CXX within the script. When I jump in the build folder, there's a mismatch between the external gbenchmark (and all deps if conda is not used) compiler and the build. To reproduce: # Create a new build directory with clang as compiler, don't build yet # In a new shell (without the compiler environment variable), go into directory invoke make/ninja -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5070) [Benchmarking] Support for on-demand and automated benchmarks
Francois Saint-Jacques created ARROW-5070: - Summary: [Benchmarking] Support for on-demand and automated benchmarks Key: ARROW-5070 URL: https://issues.apache.org/jira/browse/ARROW-5070 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques We want to be able to request for a benchmark comparison in a PR against master. This should be triggered via a github comment. The automated benchmarks would track the master branch and be triggered either on each merge or nightly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5071) [C++] Write CMake benchmark wrappers
Francois Saint-Jacques created ARROW-5071: - Summary: [C++] Write CMake benchmark wrappers Key: ARROW-5071 URL: https://issues.apache.org/jira/browse/ARROW-5071 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.14.0 Write a script that wraps a google benchmark and modifies the output to support the database schema required format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5011) [Release] Add support in the source release script for custom hash
Francois Saint-Jacques created ARROW-5011: - Summary: [Release] Add support in the source release script for custom hash Key: ARROW-5011 URL: https://issues.apache.org/jira/browse/ARROW-5011 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques Fix For: 0.13.0 This is a minor feature to help debugging said script on a by overriding the git-archive hash instead of the hash inferred from the release tag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5010) [Release] Fix release script with llvm-7
Francois Saint-Jacques created ARROW-5010: - Summary: [Release] Fix release script with llvm-7 Key: ARROW-5010 URL: https://issues.apache.org/jira/browse/ARROW-5010 Project: Apache Arrow Issue Type: Bug Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Source release script fails to compile gandiva because it requires llvm-7 and only llvm-6 is available in the ubuntu18 docker image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5036) [C++] Serialization tests resort to memcpy to check equality
Francois Saint-Jacques created ARROW-5036: - Summary: [C++] Serialization tests resort to memcpy to check equality Key: ARROW-5036 URL: https://issues.apache.org/jira/browse/ARROW-5036 Project: Apache Arrow Issue Type: Bug Components: C++ - Plasma Reporter: Francois Saint-Jacques Fix For: 0.14.0 {code:shell} 1: /tmp/arrow-0.13.0.Q4czW/apache-arrow-0.13.0/cpp/src/plasma/test/serialization_tests.cc:193: Failure 1: Expected equality of these values: 1: memcmp(_objects[object_ids[0]], _objects_return[0], sizeof(PlasmaObject)) 1: Which is: 45 1: 0 1: [ FAILED ] PlasmaSerialization.GetReply (0 ms) {code} The source of the problem is the random_plasma_object stack allocated object. As a fix, I propose that PlasmaObject implements the `operator==` method and drops the memcpy equality check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5005) [C++] Add support for filter mask in AggregateFunction
Francois Saint-Jacques created ARROW-5005: - Summary: [C++] Add support for filter mask in AggregateFunction Key: ARROW-5005 URL: https://issues.apache.org/jira/browse/ARROW-5005 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.14.0 The aggregate kernels don't support mask (the result of a filter). Add the the following method to `AggregateFunction`. {code:c++} virtual Status ConsumeWithFilter(const Array& input, const Array& mask, void* state) const = 0; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5007) [C++] Move DCHECK out of sse-utils
Francois Saint-Jacques created ARROW-5007: - Summary: [C++] Move DCHECK out of sse-utils Key: ARROW-5007 URL: https://issues.apache.org/jira/browse/ARROW-5007 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Some users tried to compile arrow on ppc64, but they face the following error {code:bash} In file included from /root/repos/arrow/cpp/src/arrow/json/chunker.h:26:0, from /root/repos/arrow/cpp/src/arrow/json/chunker.cc:18: /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘__m128i arrow::SSE4_cmpestrm(__m128i, int, __m128i, int)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: error: there are no arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of ‘DCHECK’ must be available [-fpermissive] DCHECK(false) << "CPU doesn't support SSE 4.2"; ^~ /root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated) /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘int arrow::SSE4_cmpestri(__m128i, int, __m128i, int)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:131:3: error: there are no arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of ‘DCHECK’ must be available [-fpermissive] DCHECK(false) << "CPU doesn't support SSE 4.2"; ^~ /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t arrow::SSE4_crc32_u8(uint32_t, uint8_t)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:136:3: error: ‘DCHECK’ was not declared in this scope DCHECK(false) << "SSE support is not enabled"; ^~ /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t arrow::SSE4_crc32_u16(uint32_t, uint16_t)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:141:3: error: ‘DCHECK’ was not declared in this scope DCHECK(false) << "SSE support is not enabled"; ^~ /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t arrow::SSE4_crc32_u32(uint32_t, uint32_t)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:146:3: error: ‘DCHECK’ was not declared in this scope DCHECK(false) << "SSE support is not enabled"; ^~ /root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t arrow::SSE4_crc32_u64(uint32_t, uint64_t)’: /root/repos/arrow/cpp/src/arrow/util/sse-util.h:151:3: error: ‘DCHECK’ was not declared in this scope DCHECK(false) << "SSE support is not enabled"; {code} By importing `logging.h` or removing `DCHECK`, they can compile. The fix should be to refactor the SSE detection macro out of this file such that the needing code does not need to import this file and only a header with macro detection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4673) [C++] Implement AssertDatumEquals
Francois Saint-Jacques created ARROW-4673: - Summary: [C++] Implement AssertDatumEquals Key: ARROW-4673 URL: https://issues.apache.org/jira/browse/ARROW-4673 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Aggregate tests could benefit from this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4694) [CI] detect-changes.py is inconsistent
Francois Saint-Jacques created ARROW-4694: - Summary: [CI] detect-changes.py is inconsistent Key: ARROW-4694 URL: https://issues.apache.org/jira/browse/ARROW-4694 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Affects Versions: 0.12.1 Reporter: Francois Saint-Jacques Some examples of pull-requests with wrong affected files: - [pr-3762|https://github.com/apache/arrow/pull/3762/files] shouldn't trigger [javascript|https://travis-ci.org/apache/arrow/jobs/498805479#L217] - [pr-3767|https://github.com/apache/arrow/pull/3767/files] shouldn't affect files found in [rust|https://travis-ci.org/apache/arrow/jobs/499122044] and [javascript|https://travis-ci.org/apache/arrow/jobs/499122041#L217] In [get_travis_commit_range|https://github.com/apache/arrow/blob/master/ci/detect-changes.py#L63-L67] , it references the following [comment|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-139811122]. If read further down in the [thread|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-434532772], you'll note that it can go bonkers due to shallowness and commit of branch creation. I'm not sure if this is the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4696) Verify release script is over optimist with CUDA detection
Francois Saint-Jacques created ARROW-4696: - Summary: Verify release script is over optimist with CUDA detection Key: ARROW-4696 URL: https://issues.apache.org/jira/browse/ARROW-4696 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Francois Saint-Jacques I have a Nvidia GPU without cuda, everytime I run the verification scripts it borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because `nvidia-smi --list-gpus` returns true. This can be a long process if I forget about it. Would it be better to check for `CUDA_HOME`? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4728) [Javascript] Failing test Table#assign with a zero-length Null column round-trips through serialization
Francois Saint-Jacques created ARROW-4728: - Summary: [Javascript] Failing test Table#assign with a zero-length Null column round-trips through serialization Key: ARROW-4728 URL: https://issues.apache.org/jira/browse/ARROW-4728 Project: Apache Arrow Issue Type: Bug Components: JavaScript Affects Versions: 0.12.1 Reporter: Francois Saint-Jacques Fix For: 0.13.0 See https://travis-ci.org/apache/arrow/jobs/500414242#L1002 {code:javascript} ● Table#serialize() › Table#assign with an empty table round-trips through serialization expect(received).toBe(expected) // Object.is equality Expected: 86 Received: 41 91 | const source = table1.assign(Table.empty()); 92 | expect(source.numCols).toBe(table1.numCols); > 93 | expect(source.length).toBe(table1.length); | ^ 94 | const result = Table.from(source.serialize()); 95 | expect(result).toEqualTable(source); 96 | expect(result.schema.metadata.get('foo')).toEqual('bar'); at Object.test (test/unit/table/serialize-tests.ts:93:35) ● Table#serialize() › Table#assign with a zero-length Null column round-trips through serialization expect(received).toBe(expected) // Object.is equality {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4729) [C++] Improve buffer symbolic index
Francois Saint-Jacques created ARROW-4729: - Summary: [C++] Improve buffer symbolic index Key: ARROW-4729 URL: https://issues.apache.org/jira/browse/ARROW-4729 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.12.1 Reporter: Francois Saint-Jacques The array data `buffers` vector is index differently depending on the Array type. This feature would expose static constexpr named variables for buffer index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4765) [JAVA][Flight] Memory leak
Francois Saint-Jacques created ARROW-4765: - Summary: [JAVA][Flight] Memory leak Key: ARROW-4765 URL: https://issues.apache.org/jira/browse/ARROW-4765 Project: Apache Arrow Issue Type: Improvement Components: FlightRPC, Java Affects Versions: 0.12.1 Reporter: Francois Saint-Jacques There is a potential race issue when reclaiming the FlightServer. {code:java} [ERROR] ensureIndependentSteams(org.apache.arrow.flight.TestBackPressure) Time elapsed: 1.394 s <<< ERROR! java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (131072) Allocator(perf-server) 0/131072/589824/9223372036854775807 (res/actual/peak/limit) at org.apache.arrow.flight.TestBackPressure.ensureIndependentSteams(TestBackPressure.java:76) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4643) [C++] Add compiler diagnostic color when using Ninja
Francois Saint-Jacques created ARROW-4643: - Summary: [C++] Add compiler diagnostic color when using Ninja Key: ARROW-4643 URL: https://issues.apache.org/jira/browse/ARROW-4643 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Due to [ninja-ism|https://github.com/ninja-build/ninja/issues/174], this forces color of errors/warnings. Very handy for C++. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4654) [C++] Implicit Flight target dependencies cause compilation failure
Francois Saint-Jacques created ARROW-4654: - Summary: [C++] Implicit Flight target dependencies cause compilation failure Key: ARROW-4654 URL: https://issues.apache.org/jira/browse/ARROW-4654 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Affects Versions: 0.12.0 Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques {code:sh} In file included from ../src/arrow/flight/internal.h:23:0, from ../src/arrow/python/flight.cc:20: ../src/arrow/flight/protocol-internal.h:22:10: fatal error: arrow/flight/Flight.grpc.pb.h: No such file or directory #include "arrow/flight/Flight.grpc.pb.h" // IWYU pragma: export ^~ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4660) [C++] gflags fails to build due to CMake error
Francois Saint-Jacques created ARROW-4660: - Summary: [C++] gflags fails to build due to CMake error Key: ARROW-4660 URL: https://issues.apache.org/jira/browse/ARROW-4660 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.13.0 Reporter: Francois Saint-Jacques gflags fails to build as a thirdparty download on linux and cmake 3.10.2. Removing the line `target_compile_definitions(${GFLAGS_LIBRARY} INTERFACE "GFLAGS_IS_A_DLL=0")` makes it build without issue. {code} CMake Error at cmake_modules/ThirdpartyToolchain.cmake:658 (target_compile_definitions): Cannot specify compile definitions for imported target "gflags_static". Call Stack (most recent call first): CMakeLists.txt:506 (include) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4776) [C++] DictionaryBuilder should support bootstrapping from an existing dict type
Francois Saint-Jacques created ARROW-4776: - Summary: [C++] DictionaryBuilder should support bootstrapping from an existing dict type Key: ARROW-4776 URL: https://issues.apache.org/jira/browse/ARROW-4776 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques This would mean adding a new DictionaryBuilder constructor that receives a dictionary type and performs a lazy deep copy if there's any modification. We'll have to investigate how this translate in API ergonomics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4779) [CI] AppVeyor link failure
Francois Saint-Jacques created ARROW-4779: - Summary: [CI] AppVeyor link failure Key: ARROW-4779 URL: https://issues.apache.org/jira/browse/ARROW-4779 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22841788/job/i0bmixvlw67ty284#L671 {code:java} Version 14.00.24241.7 ExceptionCode= C005 ExceptionFlags = ExceptionAddress = 7FF78516AE57 (7FF78513) "C:\PROGRA~2\MI0E91~1.0\VC\bin\amd64\link.exe" NumberParameters = 0002 ExceptionInformation[ 0] = ExceptionInformation[ 1] = 000201EF7BF0 CONTEXT: Rax= 0011 R8 = Rbx= 00CE87C812A0 R9 = 7FF78522EA30 Rcx= 7FF78522EA30 R10= Rdx= 0011 R11= 00CE8834F0C0 Rsp= 00CE8834DC00 R12= Rbp= 00CE8834DD00 E13= Rsi= R14= 0100 Rdi= 000201EF7BF0 R15= 0001 Rip= 7FF78516AE57 EFlags = 00010202 SegCs = 0033 SegDs = 002B SegSs = 002B SegEs = 002B SegFs = 0053 SegGs = 002B Dr0= Dr3= Dr1= Dr6= Dr2= Dr7= LINK : fatal error LNK1000: unknown error at 7FF78516AE1A; consult documentation for technical support options [189/282] Building CXX object src\arrow\CMakeFiles\arrow-scalar-test.dir\scalar-test.cc.obj [190/282] Building CXX object src\arrow\CMakeFiles\arrow-public-api-test.dir\public-api-test.cc.obj ninja: build stopped: subcommand failed. C:\projects\arrow\cpp\build-debug>goto scriptexit {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4962) [C++] Warning level to CHECKIN can't compile on modern GCC
Francois Saint-Jacques created ARROW-4962: - Summary: [C++] Warning level to CHECKIN can't compile on modern GCC Key: ARROW-4962 URL: https://issues.apache.org/jira/browse/ARROW-4962 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.12.1 Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.13.0 This is somewhat related to the recent DCHECK change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4838) [C++] Implement safe Make constructor
Francois Saint-Jacques created ARROW-4838: - Summary: [C++] Implement safe Make constructor Key: ARROW-4838 URL: https://issues.apache.org/jira/browse/ARROW-4838 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Fix For: 0.14.0 The following classes need validating constructors: * ArrayData * ChunkedArray * RecordBatch * Column * Table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4990) [C++] Kernel to compare array with array
Francois Saint-Jacques created ARROW-4990: - Summary: [C++] Kernel to compare array with array Key: ARROW-4990 URL: https://issues.apache.org/jira/browse/ARROW-4990 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4999) [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes
Francois Saint-Jacques created ARROW-4999: - Summary: [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes Key: ARROW-4999 URL: https://issues.apache.org/jira/browse/ARROW-4999 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Francois Saint-Jacques Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4564) [C++] IWYU docker image silently fails
Francois Saint-Jacques created ARROW-4564: - Summary: [C++] IWYU docker image silently fails Key: ARROW-4564 URL: https://issues.apache.org/jira/browse/ARROW-4564 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Affects Versions: 0.13.0 Reporter: Francois Saint-Jacques Fix For: 0.13.0 [ARROW-4528|https://issues.apache.org/jira/browse/ARROW-4528] silently removed `iwyu` from the list of installed packages. The `iwyu_tool.py` does _not_ propagate errors correctly if `iwyu` binary is not found. This seems to be resolved in more recent version, which will be addressed in [ARROW-4340|https://issues.apache.org/jira/browse/ARROW-4340]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4529) Add test coverage for BitUtils::RoundDown
Francois Saint-Jacques created ARROW-4529: - Summary: Add test coverage for BitUtils::RoundDown Key: ARROW-4529 URL: https://issues.apache.org/jira/browse/ARROW-4529 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4531) Handling of non-aligned slices in Sum kernel
Francois Saint-Jacques created ARROW-4531: - Summary: Handling of non-aligned slices in Sum kernel Key: ARROW-4531 URL: https://issues.apache.org/jira/browse/ARROW-4531 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques The Sum kernel does not support slices where the offset is not byte-aligned. Other kernels avoid this problem due to BitmapReader usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4530) Review Aggregate kernel state allocation/ownership semantics
Francois Saint-Jacques created ARROW-4530: - Summary: Review Aggregate kernel state allocation/ownership semantics Key: ARROW-4530 URL: https://issues.apache.org/jira/browse/ARROW-4530 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4364) [C++] Fix -weverything -wextra compilation errors
Francois Saint-Jacques created ARROW-4364: - Summary: [C++] Fix -weverything -wextra compilation errors Key: ARROW-4364 URL: https://issues.apache.org/jira/browse/ARROW-4364 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.12.0 Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5171) [C++] Use LESS instead of LOWER in compare enum option.
Francois Saint-Jacques created ARROW-5171: - Summary: [C++] Use LESS instead of LOWER in compare enum option. Key: ARROW-5171 URL: https://issues.apache.org/jira/browse/ARROW-5171 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques See https://github.com/apache/arrow/pull/3963#discussion_r275596603 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5175) [Benchmarking] Decide which benchmarks are part of regression checks
Francois Saint-Jacques created ARROW-5175: - Summary: [Benchmarking] Decide which benchmarks are part of regression checks Key: ARROW-5175 URL: https://issues.apache.org/jira/browse/ARROW-5175 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5489) [C++
Francois Saint-Jacques created ARROW-5489: - Summary: [C++ Key: ARROW-5489 URL: https://issues.apache.org/jira/browse/ARROW-5489 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5464) [Archery] Bad --benchmark-filter default
Francois Saint-Jacques created ARROW-5464: - Summary: [Archery] Bad --benchmark-filter default Key: ARROW-5464 URL: https://issues.apache.org/jira/browse/ARROW-5464 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5527) [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) for heap data
Francois Saint-Jacques created ARROW-5527: - Summary: [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) for heap data Key: ARROW-5527 URL: https://issues.apache.org/jira/browse/ARROW-5527 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques The current implementation uses `std::vector` and `std::string` with unbounded size. The refactor would take a memory pool in the constructor for buffer management and would get rid of vectors. This will have the side effect of propagating Status to some calls (notably insert due to Upsize failing to resize). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5530) [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null behavior
Francois Saint-Jacques created ARROW-5530: - Summary: [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null behavior Key: ARROW-5530 URL: https://issues.apache.org/jira/browse/ARROW-5530 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5611) [C++] Improve clang-tidy speed
Francois Saint-Jacques created ARROW-5611: - Summary: [C++] Improve clang-tidy speed Key: ARROW-5611 URL: https://issues.apache.org/jira/browse/ARROW-5611 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Francois Saint-Jacques See https://github.com/apache/arrow/pull/4293#issuecomment-501950675 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5612) [Python][Documentation] Add prominent note that date_as_object option changed with Arrow 0.13
Francois Saint-Jacques created ARROW-5612: - Summary: [Python][Documentation] Add prominent note that date_as_object option changed with Arrow 0.13 Key: ARROW-5612 URL: https://issues.apache.org/jira/browse/ARROW-5612 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5652) [CI] Fix iwyu docker image
Francois Saint-Jacques created ARROW-5652: - Summary: [CI] Fix iwyu docker image Key: ARROW-5652 URL: https://issues.apache.org/jira/browse/ARROW-5652 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques See [https://travis-ci.org/ursa-labs/crossbow/builds/547691665?utm_source=github_status_medium=notification] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5653) [CI] Fix cpp docker image
Francois Saint-Jacques created ARROW-5653: - Summary: [CI] Fix cpp docker image Key: ARROW-5653 URL: https://issues.apache.org/jira/browse/ARROW-5653 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques {code:shell} make -f Makefile.docker run-cpp ... 54/64 Test #79: arrow-dataset-file_test ***Failed0.04 sec Running arrow-dataset-file_test, redirecting output into /build/cpp/build/test-logs/arrow-dataset-file_test.txt (attempt 1/1) /build/cpp/debug/arrow-dataset-file_test: error while loading shared libraries: libbrotlienc.so.1: cannot open shared object file: No such file or directory /build/cpp/src/arrow/dataset Start 80: arrow-flight-test 55/64 Test #80: arrow-flight-test ..***Failed0.04 sec Running arrow-flight-test, redirecting output into /build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1) /build/cpp/debug/arrow-flight-t {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5680) [Rust] Nightly tests are failing
Francois Saint-Jacques created ARROW-5680: - Summary: [Rust] Nightly tests are failing Key: ARROW-5680 URL: https://issues.apache.org/jira/browse/ARROW-5680 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Francois Saint-Jacques See https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further failures, e.g. {code:bash} running 18 tests test csv_query_group_by_int_min_max ... FAILED test csv_query_external_table_count ... ok test csv_query_count ... ok test csv_count_star ... ok test csv_query_avg ... ok test csv_query_avg_multi_batch ... ok test csv_query_cast ... ok test csv_query_group_by_avg ... FAILED test csv_query_group_by_string_min_max ... FAILED test csv_query_group_by_int_count ... FAILED test csv_query_limit ... ok test csv_query_limit_bigger_than_nbr_of_rows ... ok test csv_query_limit_with_same_nbr_of_rows ... ok test csv_query_cast_literal ... ok test csv_query_limit_zero ... ok test csv_query_create_external_table ... ok test csv_query_with_predicate ... ok test parquet_query ... ok failures: csv_query_group_by_int_min_max stdout thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left == right)` left: `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`, right: `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`', datafusion/tests/sql.rs:77:5 note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace. csv_query_group_by_avg stdout thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == right)` left: `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`, right: `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`', datafusion/tests/sql.rs:99:5 csv_query_group_by_string_min_max stdout thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: `(left == right)` left: `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`, right: `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`', datafusion/tests/sql.rs:187:5 csv_query_group_by_int_count stdout thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left == right)` left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`, right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', datafusion/tests/sql.rs:175:5 {code} I suspect that the tests are expecting the group-by results in a fix order. That would be highly dependent on the iterator of the hash table. Note that once I did a rustup update (and docker rmi rustlangrust/nightly), the failures have gone away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5544) [Archery] should not return non-zero in `benchmark diff` sub command on regression
Francois Saint-Jacques created ARROW-5544: - Summary: [Archery] should not return non-zero in `benchmark diff` sub command on regression Key: ARROW-5544 URL: https://issues.apache.org/jira/browse/ARROW-5544 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques When a regression is detected, but the command ran successfully, it should return zero. Currently it returns the number of regression. This is to play better with ursabot. It should be left to the user to decide what to do with the json data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5269) [C++] Whitelist benchmarks candidates for regression checks
Francois Saint-Jacques created ARROW-5269: - Summary: [C++] Whitelist benchmarks candidates for regression checks Key: ARROW-5269 URL: https://issues.apache.org/jira/browse/ARROW-5269 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 0.14.0 Rename all benchmarks candidate for regression with the `Regression` prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5251) [C++][Parquet] Bad initialization in statistics computation
Francois Saint-Jacques created ARROW-5251: - Summary: [C++][Parquet] Bad initialization in statistics computation Key: ARROW-5251 URL: https://issues.apache.org/jira/browse/ARROW-5251 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques The following lines are undefined if the first element is null. https://github.com/apache/arrow/blob/250e97c70f497581bca412dfd2a654a1f9736064/cpp/src/parquet/statistics.cc#L159-L160 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5253) [C++] external Snappy fails on Alpine
Francois Saint-Jacques created ARROW-5253: - Summary: [C++] external Snappy fails on Alpine Key: ARROW-5253 URL: https://issues.apache.org/jira/browse/ARROW-5253 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.13.0 Reporter: Francois Saint-Jacques Fix For: 0.14.0 {code:bash} FAILED: debug/libarrow.so.14.0.0 : && /usr/bin/c++ -fPIC -Wno-noexcept-type -fdiagnostics-color=always -ggdb -O0 -Wall -Wno-conversion -Wno-sign-conversion -Wno-unused-variable -Werror -msse4.2 -g -Wl,--version-script=/buildbot/amd64-alpine-3_9-cpp/cpp/src/arrow/symbols.map -shared -Wl,-soname,libarrow.so.14 -o debug/libarrow.so.14.0.0 ... c++: error: snappy_ep/src/snappy_ep-install/lib/libsnappy.a: No such file or directory {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5739) [CI] Fix docker python build
Francois Saint-Jacques created ARROW-5739: - Summary: [CI] Fix docker python build Key: ARROW-5739 URL: https://issues.apache.org/jira/browse/ARROW-5739 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration Reporter: Francois Saint-Jacques python docker image will fail to clean the build directory, installing a previous invocation of `docker-compose run python`. This is not affecting CI that drops the `/build` mount, but only local users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5779) [R][CI] R's docker image fails due to incompatibility
Francois Saint-Jacques created ARROW-5779: - Summary: [R][CI] R's docker image fails due to incompatibility Key: ARROW-5779 URL: https://issues.apache.org/jira/browse/ARROW-5779 Project: Apache Arrow Issue Type: Bug Reporter: Francois Saint-Jacques {code:bash} The downloaded source packages are in '/tmp/RtmpLu0eiq/downloaded_packages' v checking for file '/tmp/RtmpLu0eiq/remotes1a8d7c759a55/romainfrancois-decor-6c5a5aa/DESCRIPTION' ... - preparing 'decor': v checking DESCRIPTION meta-information ... - cleaning src - checking for LF line-endings in source and make files and shell scripts - checking for empty or unneeded directories - building 'decor_0.0.0.9001.tar.gz' Installing package into '/usr/local/lib/R/site-library' (as 'lib' is unspecified) ERROR: this R is version 3.4.4, package 'decor' requires R >= 3.5.0 Error: Failed to install 'decor' from GitHub: (converted from warning) installation of package '/tmp/RtmpLu0eiq/file1a8d6986708c/decor_0.0.0.9001.tar.gz' had non-zero exit status Execution halted ERROR: Service 'r' failed to build: The command '/bin/sh -c Rscript -e "install.packages('devtools', repos = 'http://cran.rstudio.com')" && Rscript -e "devtools::install_github('romainfrancois/decor')" && Rscript -e "install.packages(c( 'Rcpp', 'dplyr', 'stringr', 'glue', 'vctrs', 'purrr', 'assertthat', 'fs', 'tibble', 'crayon', 'testthat', 'bit64', 'hms', 'lubridate'), repos = 'https://cran.rstudio.com')"' returned a non-zero code: 1 Makefile.docker:49: recipe for target 'build-r' failed {code} I'm not sure if the fix is just to bump R's version in the image, or avoid the failing package. cc [~romainfrancois] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5914) [CI] Build bundled dependencies in docker build step
Francois Saint-Jacques created ARROW-5914: - Summary: [CI] Build bundled dependencies in docker build step Key: ARROW-5914 URL: https://issues.apache.org/jira/browse/ARROW-5914 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques Fix For: 1.0.0 In the recently introduced ARROW-5803, some heavy dependencies (thrift, protobuf, flatbufers, grpc) are build at each invocation of docker-compose build (thus each travis test). We should aim to build the third party dependencies in docker build phase instead, to exploit caching and docker-compose pull so that the CI step doesn't need to build said dependencies each time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5923) [C++] Fix int96 comment
Francois Saint-Jacques created ARROW-5923: - Summary: [C++] Fix int96 comment Key: ARROW-5923 URL: https://issues.apache.org/jira/browse/ARROW-5923 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Francois Saint-Jacques Assignee: Micah Kornfield -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-5202) [C++
Francois Saint-Jacques created ARROW-5202: - Summary: [C++ Key: ARROW-5202 URL: https://issues.apache.org/jira/browse/ARROW-5202 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5781) [Archery] Ensure benchmark clone accepts remotes in revision
Francois Saint-Jacques created ARROW-5781: - Summary: [Archery] Ensure benchmark clone accepts remotes in revision Key: ARROW-5781 URL: https://issues.apache.org/jira/browse/ARROW-5781 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Affects Versions: 0.13.0 Reporter: Francois Saint-Jacques Found that ursabot would always compare the PR tip commit with itself via https://github.com/apache/arrow/pull/4739#issuecomment-506819250 . This is due to buildbot github behavior of using a git-reset --hard local that changes the `master` rev to this new state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-6123) [C++] IsIn kernel should not materialize the output internal
Francois Saint-Jacques created ARROW-6123: - Summary: [C++] IsIn kernel should not materialize the output internal Key: ARROW-6123 URL: https://issues.apache.org/jira/browse/ARROW-6123 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques It should use the helpers since the output size is known. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6121) [Tools] Improve merge tool cli ergonomic
Francois Saint-Jacques created ARROW-6121: - Summary: [Tools] Improve merge tool cli ergonomic Key: ARROW-6121 URL: https://issues.apache.org/jira/browse/ARROW-6121 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques * Accepts the pull-request number as an optional (first) parameter to the script * Supports reading the jira username/password from a file -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6122) [C++] IsIn kernel must support FixedSizeBinary
Francois Saint-Jacques created ARROW-6122: - Summary: [C++] IsIn kernel must support FixedSizeBinary Key: ARROW-6122 URL: https://issues.apache.org/jira/browse/ARROW-6122 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.15.0 Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6124) [C++] IsIn kernel should sort in a single pass (with nulls)
Francois Saint-Jacques created ARROW-6124: - Summary: [C++] IsIn kernel should sort in a single pass (with nulls) Key: ARROW-6124 URL: https://issues.apache.org/jira/browse/ARROW-6124 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.15.0 Reporter: Francois Saint-Jacques There's a good chance that merge sort must be implemented (spill to disk, ChunkedArray, ...) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6244) [C++] Implement Partition DataSource
Francois Saint-Jacques created ARROW-6244: - Summary: [C++] Implement Partition DataSource Key: ARROW-6244 URL: https://issues.apache.org/jira/browse/ARROW-6244 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Francois Saint-Jacques This is a DataSource that also has partition metadata. The end goal is to support filtering with a DataSelector/Filter expression. The initial implementation should not deal with PartitionScheme yet. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6242) [C++] Implements basic Dataset/Scanner/ScannerBuilder
Francois Saint-Jacques created ARROW-6242: - Summary: [C++] Implements basic Dataset/Scanner/ScannerBuilder Key: ARROW-6242 URL: https://issues.apache.org/jira/browse/ARROW-6242 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Francois Saint-Jacques The goal of this would be to iterate over a Dataset and generate a "flattened" stream of RecordBatches from the union of data sources and data fragments. This should not bother with filtering yet. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6396) [C++] Add CompareOptions to Compare kernels
Francois Saint-Jacques created ARROW-6396: - Summary: [C++] Add CompareOptions to Compare kernels Key: ARROW-6396 URL: https://issues.apache.org/jira/browse/ARROW-6396 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Francois Saint-Jacques This would add an enum ResolveNull \{ KLEENE_LOGIC, NULL_PROPAGATE }. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6397) [C++][CI] Fix S3 minio failure
Francois Saint-Jacques created ARROW-6397: - Summary: [C++][CI] Fix S3 minio failure Key: ARROW-6397 URL: https://issues.apache.org/jira/browse/ARROW-6397 Project: Apache Arrow Issue Type: New Feature Components: C++, Continuous Integration Reporter: Francois Saint-Jacques See [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27065941/job/gwjmr2hudm7693ef] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6448) [CI] Add crossbow notifications
Francois Saint-Jacques created ARROW-6448: - Summary: [CI] Add crossbow notifications Key: ARROW-6448 URL: https://issues.apache.org/jira/browse/ARROW-6448 Project: Apache Arrow Issue Type: New Feature Components: Continuous Integration Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6378) [C++][Dataset] Implement TreeDataSource
Francois Saint-Jacques created ARROW-6378: - Summary: [C++][Dataset] Implement TreeDataSource Key: ARROW-6378 URL: https://issues.apache.org/jira/browse/ARROW-6378 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques The TreeDataSource is required to support partitions pruning of sub-trees. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6341) [Python] Implements low-level bindings to Dataset classes:
Francois Saint-Jacques created ARROW-6341: - Summary: [Python] Implements low-level bindings to Dataset classes: Key: ARROW-6341 URL: https://issues.apache.org/jira/browse/ARROW-6341 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Francois Saint-Jacques The following classes should be accessible from Python: * class DataSource * class DataFragment * function DiscoverySource * class ScanContext, ScanOptions, ScanTask * class Dataset * class ScannerBuilder * class Scanner The end result is reading a directory of parquet files as a single stream. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6340) [R] Implements low-level bindings to Dataset classes
Francois Saint-Jacques created ARROW-6340: - Summary: [R] Implements low-level bindings to Dataset classes Key: ARROW-6340 URL: https://issues.apache.org/jira/browse/ARROW-6340 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Francois Saint-Jacques The following classes should be accessible from R: * class DataSource * class DataFragment * function DiscoverySource * class ScanContext, ScanOptions, ScanTask * class Dataset * class ScannerBuilder * class Scanner The end result is reading a directory of parquet files as a single stream -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6476) [Java][CI] Travis java all-jdks job is broken
Francois Saint-Jacques created ARROW-6476: - Summary: [Java][CI] Travis java all-jdks job is broken Key: ARROW-6476 URL: https://issues.apache.org/jira/browse/ARROW-6476 Project: Apache Arrow Issue Type: Bug Reporter: Francois Saint-Jacques Introduced by ARROW-6433, fixing the shade check enabled evaluation of the incorrect body. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6605) [C++] Add recursion depth control to fs::Selector
Francois Saint-Jacques created ARROW-6605: - Summary: [C++] Add recursion depth control to fs::Selector Key: ARROW-6605 URL: https://issues.apache.org/jira/browse/ARROW-6605 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques This is similar to the recursive options, but also control the depth. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6606) [C++] Construct tree structure from std::vector
Francois Saint-Jacques created ARROW-6606: - Summary: [C++] Construct tree structure from std::vector Key: ARROW-6606 URL: https://issues.apache.org/jira/browse/ARROW-6606 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques This will be used by FileSystemDataSource for pushdown predicate pruning of branches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6615) [C++] Add filtering option to fs::Selector
Francois Saint-Jacques created ARROW-6615: - Summary: [C++] Add filtering option to fs::Selector Key: ARROW-6615 URL: https://issues.apache.org/jira/browse/ARROW-6615 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques It would convenient if Selector could support file path filtering, either via a regex or globbing applied to the path. This is semi required for filtering file in Dataset to properly apply the file format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery
Francois Saint-Jacques created ARROW-6614: - Summary: [C++][Dataset] Implement FileSystemDataSourceDiscovery Key: ARROW-6614 URL: https://issues.apache.org/jira/browse/ARROW-6614 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques DataSourceDiscovery is what allows InferingSchema and constructing a DataSource with PartitionScheme. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6161) [C++] Implements dataset::ParquetFile and associated Scan structures
Francois Saint-Jacques created ARROW-6161: - Summary: [C++] Implements dataset::ParquetFile and associated Scan structures Key: ARROW-6161 URL: https://issues.apache.org/jira/browse/ARROW-6161 Project: Apache Arrow Issue Type: New Feature Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6148) Missing debian build dependencies
Francois Saint-Jacques created ARROW-6148: - Summary: Missing debian build dependencies Key: ARROW-6148 URL: https://issues.apache.org/jira/browse/ARROW-6148 Project: Apache Arrow Issue Type: Bug Components: Packaging Reporter: Francois Saint-Jacques -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6730) [CI] Use Github Actions for "C++ with clang 7" docker image
Francois Saint-Jacques created ARROW-6730: - Summary: [CI] Use Github Actions for "C++ with clang 7" docker image Key: ARROW-6730 URL: https://issues.apache.org/jira/browse/ARROW-6730 Project: Apache Arrow Issue Type: New Feature Components: Continuous Integration Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6769) [C++][Dataset] End to End dataset integration test case
Francois Saint-Jacques created ARROW-6769: - Summary: [C++][Dataset] End to End dataset integration test case Key: ARROW-6769 URL: https://issues.apache.org/jira/browse/ARROW-6769 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Francois Saint-Jacques 1. Create a DataSource from a known directory and a PartitionScheme. 2. Create a Dataset from the previous DataSource. 3. Request a ScannerBuilder from previous Dataset. 4. Add filter expression to ScannerBuilder (and other options). 5. Finalize into a Scan operation. 6. Materialize into an arrow::Table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7148) [C++][Dataset] API cleanup
Francois Saint-Jacques created ARROW-7148: - Summary: [C++][Dataset] API cleanup Key: ARROW-7148 URL: https://issues.apache.org/jira/browse/ARROW-7148 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset Reporter: Francois Saint-Jacques Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7178) [C++] Vendor forward compatible std::optional
Francois Saint-Jacques created ARROW-7178: - Summary: [C++] Vendor forward compatible std::optional Key: ARROW-7178 URL: https://issues.apache.org/jira/browse/ARROW-7178 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Having std::optional was mentioned a few time, [~emkornfi...@gmail.com] suggested https://github.com/martinmoene/optional-lite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7079) [C++][Dataset] Implement ScalarAsStatisctics for non-primitive types
Francois Saint-Jacques created ARROW-7079: - Summary: [C++][Dataset] Implement ScalarAsStatisctics for non-primitive types Key: ARROW-7079 URL: https://issues.apache.org/jira/browse/ARROW-7079 Project: Apache Arrow Issue Type: Bug Components: C++ - Dataset Reporter: Francois Saint-Jacques Statistics are not extracted for the following (parquet) types - BYTE_ARRAY - FLBA - Any logical timestamps/dates -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7360) [R] Can't use dataset's filter with non-literal expression
Francois Saint-Jacques created ARROW-7360: - Summary: [R] Can't use dataset's filter with non-literal expression Key: ARROW-7360 URL: https://issues.apache.org/jira/browse/ARROW-7360 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Francois Saint-Jacques The following will generate an error {code:r} test_that("filtering with expression", { char_sym <- "b" expect_dplyr_equal( input %>% filter(chr == char_sym) %>% select(string = chr, int) %>% collect(), tbl ) }) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project
Francois Saint-Jacques created ARROW-7390: - Summary: [C++][Dataset] Concurrency race in Projector::Project Key: ARROW-7390 URL: https://issues.apache.org/jira/browse/ARROW-7390 Project: Apache Arrow Issue Type: Bug Reporter: Francois Saint-Jacques When a DataFragment is invoked by 2 scan tasks of the same DataFragment, there's a race to invoke SetInputSchema. Note that ResizeMissingColumns also suffers from this race. The ideal goal is to make Project a const method. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7380) [C++][Dataset] Implement DatasetDiscovery
Francois Saint-Jacques created ARROW-7380: - Summary: [C++][Dataset] Implement DatasetDiscovery Key: ARROW-7380 URL: https://issues.apache.org/jira/browse/ARROW-7380 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Takes a list of DataSourceDiscovery and yields a Dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7379) [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith
Francois Saint-Jacques created ARROW-7379: - Summary: [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith Key: ARROW-7379 URL: https://issues.apache.org/jira/browse/ARROW-7379 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques The methods verifies if fields/schemas are compatible with regards to naming and type. This is a partly extracted from `UnifySchemas`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7338) [C++] Rename SimpleDataSource to InMemoryDataSource
Francois Saint-Jacques created ARROW-7338: - Summary: [C++] Rename SimpleDataSource to InMemoryDataSource Key: ARROW-7338 URL: https://issues.apache.org/jira/browse/ARROW-7338 Project: Apache Arrow Issue Type: Improvement Components: C++ - Dataset Reporter: Francois Saint-Jacques The constructor should take a generator {code:c++} // Some comments here class InMemoryDataSource : public DataSource { public: using Generator = std::function>; InMemoryDataSource(Generator&& generator); // Convenience constructor to support a fixed list of RecordBatch InMemoryDataSource(std::shared_ptr); InMemoryDataSource(std::vector>); private: Generator generator; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7339) [CMake] Thrift version not respected in CMake configuration version.txt
Francois Saint-Jacques created ARROW-7339: - Summary: [CMake] Thrift version not respected in CMake configuration version.txt Key: ARROW-7339 URL: https://issues.apache.org/jira/browse/ARROW-7339 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques If thrift is requested via BUNBLED, thrift 0.9.1 will be downloaded instead of the requested version. This is due to FindThrift.cmake overriding THRIFT_VERSION from the locally installed thrift compiler (0.9.1. on ubuntu 18.04). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7007) [C++] Enable mmap option for LocalFs
Francois Saint-Jacques created ARROW-7007: - Summary: [C++] Enable mmap option for LocalFs Key: ARROW-7007 URL: https://issues.apache.org/jira/browse/ARROW-7007 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6953) [C++][Dataset] Implement Gandiva Filter/Projector in Scanner
Francois Saint-Jacques created ARROW-6953: - Summary: [C++][Dataset] Implement Gandiva Filter/Projector in Scanner Key: ARROW-6953 URL: https://issues.apache.org/jira/browse/ARROW-6953 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques Currently, we have `RecordBatchProjector` and `ExpressionEvaluator` to achieve this feature. This would implement a single class that fuse both and uses gandiva. This would be exposed in the ScannerBuilder via an option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6956) [C++] Status should use unique_ptr
Francois Saint-Jacques created ARROW-6956: - Summary: [C++] Status should use unique_ptr Key: ARROW-6956 URL: https://issues.apache.org/jira/browse/ARROW-6956 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques The logic of Status::State is _very_ similar to unique_ptr except the deep copy on copy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6950) [C++][Dataset] Add example/benchmark for reading parquet files with dataset
Francois Saint-Jacques created ARROW-6950: - Summary: [C++][Dataset] Add example/benchmark for reading parquet files with dataset Key: ARROW-6950 URL: https://issues.apache.org/jira/browse/ARROW-6950 Project: Apache Arrow Issue Type: Test Components: C++ Reporter: Francois Saint-Jacques Create an executable that load a directory with a known partition scheme with a filter and a projection. This will be used as a baseline for future performance improvement but also to show various feature of the dataset API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6951) [C++][Dataset] Ensure column projection is passed to ParquetDataFragment
Francois Saint-Jacques created ARROW-6951: - Summary: [C++][Dataset] Ensure column projection is passed to ParquetDataFragment Key: ARROW-6951 URL: https://issues.apache.org/jira/browse/ARROW-6951 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6952) [C++][Dataset] Ensure expression filter is passed ParquetDataFragment
Francois Saint-Jacques created ARROW-6952: - Summary: [C++][Dataset] Ensure expression filter is passed ParquetDataFragment Key: ARROW-6952 URL: https://issues.apache.org/jira/browse/ARROW-6952 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques We should be able to prune RowGroups based on the expression and the statistics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6902) [C++] Add String*/Binary* support for Compare kernels
Francois Saint-Jacques created ARROW-6902: - Summary: [C++] Add String*/Binary* support for Compare kernels Key: ARROW-6902 URL: https://issues.apache.org/jira/browse/ARROW-6902 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6987) [CI] Travis OSX failing to install sdk headers
Francois Saint-Jacques created ARROW-6987: - Summary: [CI] Travis OSX failing to install sdk headers Key: ARROW-6987 URL: https://issues.apache.org/jira/browse/ARROW-6987 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Francois Saint-Jacques {code:java} sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /343installer: Package name is macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign package is not trusted. Use -allowUntrusted to override.345The command "$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" failed and exited with 1 during . {code} See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6988) [CI][R] Buildbot's R Conda is failing
Francois Saint-Jacques created ARROW-6988: - Summary: [CI][R] Buildbot's R Conda is failing Key: ARROW-6988 URL: https://issues.apache.org/jira/browse/ARROW-6988 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques {code:java} Running ‘testthat.R’ ERROR Running the tests in ‘tests/testthat.R’ failed. Last 13 lines of output: 25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled && !is.null(test)) {skip_empty()}}, expectation = handle_expectation, skip = handle_skip, warning = handle_warning, message = handle_message, error = handle_error), error = handle_fatal, skip = function(e) {}) 26: test_code(NULL, exprs, env) 27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap) 28: force(code) 29: with_reporter(reporter = reporter, start_end_reporter = start_end_reporter, {reporter$start_file(basename(path)) lister$start_file(basename(path))source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)reporter$.end_context() reporter$end_file()}) 30: FUN(X[[i]], ...) 31: lapply(paths, test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap) 32: force(code) 33: with_reporter(reporter = current_reporter, results <- lapply(paths, test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)) 34: test_files(paths, reporter = reporter, env = env, stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap) 35: test_dir(path = test_path, reporter = reporter, env = env, filter = filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap) 36: test_package_dir(package = package, test_path = test_path, filter = filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap) 37: test_check("arrow") An irrecoverable exception occurred. R is aborting now ... Segmentation fault (core dumped) * checking for unstated dependencies in vignettes ... OK * checking package vignettes in ‘inst/doc’ ... OK * checking re-building of vignette outputs ... OK * DONE Status: 1 ERROR, 1 WARNING, 2 NOTEs See ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’ for details. {code} [|https://ci.ursalabs.org/#/builders/95] [https://ci.ursalabs.org/#/builders/95/builds/2386] [https://ci.ursalabs.org/#/builders/95] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types
Francois Saint-Jacques created ARROW-7017: - Summary: [C++] Refactor AddKernel to support other operations and types Key: ARROW-7017 URL: https://issues.apache.org/jira/browse/ARROW-7017 Project: Apache Arrow Issue Type: Improvement Components: C++ - Compute Reporter: Francois Saint-Jacques * Should avoid using builders (and/or NULLs) since the output shape is known a compute time. * Should be refatored to support other operations, e.g. Substraction, Multiplication. * Should have a overflow, underflow detection mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6964) [C++][Dataset] Expose a nested parellel option for Scanner
Francois Saint-Jacques created ARROW-6964: - Summary: [C++][Dataset] Expose a nested parellel option for Scanner Key: ARROW-6964 URL: https://issues.apache.org/jira/browse/ARROW-6964 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6965) [C++][Dataset] Optionally expose partition keys as materialized columns
Francois Saint-Jacques created ARROW-6965: - Summary: [C++][Dataset] Optionally expose partition keys as materialized columns Key: ARROW-6965 URL: https://issues.apache.org/jira/browse/ARROW-6965 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file
Francois Saint-Jacques created ARROW-6969: - Summary: [C++][Dataset] ParquetScanTask eagerly load file Key: ARROW-6969 URL: https://issues.apache.org/jira/browse/ARROW-6969 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques The file content should only be read when invoking ParquetScanTask::Scan, not on construction. This blocks reading in a true streaming fashion with memory constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7210) [C++] Scalar cast should support time-based types
Francois Saint-Jacques created ARROW-7210: - Summary: [C++] Scalar cast should support time-based types Key: ARROW-7210 URL: https://issues.apache.org/jira/browse/ARROW-7210 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques This would allow supporting a minimum of expression evaluation on time-based arrays. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7265) [Format][C++] Clarify the usage of typeIds in Union type documentation
Francois Saint-Jacques created ARROW-7265: - Summary: [Format][C++] Clarify the usage of typeIds in Union type documentation Key: ARROW-7265 URL: https://issues.apache.org/jira/browse/ARROW-7265 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques The documentation is unclear. -- This message was sent by Atlassian Jira (v8.3.4#803005)