[jira] [Created] (ARROW-4268) [C++] Add C primitive to Arrow:Type compile time in TypeTraits

2019-01-15 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4268:
-

 Summary: [C++] Add C primitive to Arrow:Type compile time in 
TypeTraits
 Key: ARROW-4268
 URL: https://issues.apache.org/jira/browse/ARROW-4268
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


The user would use something like
```
...
using ArrowType = CTypeTraits::ArrowType;
using ArrayType = CTypeTraits::ArrayType;

auto type = CTypeTraits::type_singleton();
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3998) Support TPC-H dbgen in Arrow

2018-12-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-3998:
-

 Summary: Support TPC-H dbgen in Arrow
 Key: ARROW-3998
 URL: https://issues.apache.org/jira/browse/ARROW-3998
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Francois Saint-Jacques


Integration tests and benchmarks should read TPC-H data. This is going to be 
useful for future query execution engine benchmarking.

It could also attract researchers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3990) [Python] developer documentation is missing double-conversion dep

2018-12-10 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-3990:
-

 Summary: [Python] developer documentation is missing 
double-conversion dep
 Key: ARROW-3990
 URL: https://issues.apache.org/jira/browse/ARROW-3990
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4084) Simplify Status and stringstream boilerplate

2018-12-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4084:
-

 Summary: Simplify Status and stringstream boilerplate
 Key: ARROW-4084
 URL: https://issues.apache.org/jira/browse/ARROW-4084
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


There's a lot of stringstream repetition when creating a Status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4102) [C++] FixedSizeBinary identity cast not implemented

2018-12-21 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4102:
-

 Summary: [C++] FixedSizeBinary identity cast not implemented
 Key: ARROW-4102
 URL: https://issues.apache.org/jira/browse/ARROW-4102
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.12.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3862) Improve dependencies download script

2018-11-23 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-3862:
-

 Summary: Improve dependencies download script 
 Key: ARROW-3862
 URL: https://issues.apache.org/jira/browse/ARROW-3862
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4407) [CMake] ExternalProject_Add does not capture CC/CXX correctly

2019-01-28 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4407:
-

 Summary: [CMake] ExternalProject_Add does not capture CC/CXX 
correctly
 Key: ARROW-4407
 URL: https://issues.apache.org/jira/browse/ARROW-4407
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.0
Reporter: Francois Saint-Jacques


The issue is that CC/CXX environment variables are captured on the first 
invocation of the builder (e.g make or ninja) instead of when CMake is invoked 
into to build directory. This can lead to compilation errors (notably when 
compiling with clang in the top directory due to the addition of the 
`-Qunused-arguments` option).

This leads to an issue where I have a script that prepare the build directory 
and export CXX within the script. When I jump in the build folder, there's a 
mismatch between the external gbenchmark (and all deps if conda is not used) 
compiler and the build.

To reproduce:
# Create a new build directory with clang as compiler, don't build yet
# In a new shell (without the compiler environment variable), go into directory 
invoke make/ninja



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5070) [Benchmarking] Support for on-demand and automated benchmarks

2019-03-29 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5070:
-

 Summary: [Benchmarking] Support for on-demand and automated 
benchmarks
 Key: ARROW-5070
 URL: https://issues.apache.org/jira/browse/ARROW-5070
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


We want to be able to request for a benchmark comparison in a PR against 
master. This should be triggered via a github comment.

The automated benchmarks would track the master branch and be triggered either 
on each merge or nightly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5071) [C++] Write CMake benchmark wrappers

2019-03-29 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5071:
-

 Summary: [C++] Write CMake benchmark wrappers
 Key: ARROW-5071
 URL: https://issues.apache.org/jira/browse/ARROW-5071
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.14.0


Write a script that wraps a google benchmark and modifies the output to support 
the database schema required format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5011) [Release] Add support in the source release script for custom hash

2019-03-26 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5011:
-

 Summary: [Release] Add support in the source release script for 
custom hash
 Key: ARROW-5011
 URL: https://issues.apache.org/jira/browse/ARROW-5011
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques
 Fix For: 0.13.0


This is a minor feature to help debugging said script on a by overriding the 
git-archive hash instead of the hash inferred from the release tag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5010) [Release] Fix release script with llvm-7

2019-03-26 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5010:
-

 Summary: [Release] Fix release script with llvm-7
 Key: ARROW-5010
 URL: https://issues.apache.org/jira/browse/ARROW-5010
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


Source release script fails to compile gandiva because it requires llvm-7 and 
only llvm-6 is available in the ubuntu18 docker image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5036) [C++] Serialization tests resort to memcpy to check equality

2019-03-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5036:
-

 Summary: [C++] Serialization tests resort to memcpy to check 
equality
 Key: ARROW-5036
 URL: https://issues.apache.org/jira/browse/ARROW-5036
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Plasma
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0


{code:shell}
1: 
/tmp/arrow-0.13.0.Q4czW/apache-arrow-0.13.0/cpp/src/plasma/test/serialization_tests.cc:193:
 Failure
1: Expected equality of these values:
1:   memcmp(_objects[object_ids[0]], _objects_return[0], 
sizeof(PlasmaObject))
1: Which is: 45
1:   0
1: [  FAILED  ] PlasmaSerialization.GetReply (0 ms)
{code}

The source of the problem is the random_plasma_object stack allocated object. 
As a fix, I propose that PlasmaObject implements the `operator==` method and 
drops the memcpy equality check.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5005) [C++] Add support for filter mask in AggregateFunction

2019-03-25 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5005:
-

 Summary: [C++] Add support for filter mask in AggregateFunction
 Key: ARROW-5005
 URL: https://issues.apache.org/jira/browse/ARROW-5005
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.14.0


The aggregate kernels don't support mask (the result of a filter). Add the the 
following method to `AggregateFunction`.

{code:c++}
virtual Status ConsumeWithFilter(const Array& input, const Array& mask, void* 
state) const = 0;
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5007) [C++] Move DCHECK out of sse-utils

2019-03-25 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5007:
-

 Summary: [C++] Move DCHECK out of sse-utils 
 Key: ARROW-5007
 URL: https://issues.apache.org/jira/browse/ARROW-5007
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Some users tried to compile arrow on ppc64, but they face the following error

{code:bash}
In file included from /root/repos/arrow/cpp/src/arrow/json/chunker.h:26:0,
 from /root/repos/arrow/cpp/src/arrow/json/chunker.cc:18:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘__m128i 
arrow::SSE4_cmpestrm(__m128i, int, __m128i, int)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: error: there are no 
arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of 
‘DCHECK’ must be available [-fpermissive]
   DCHECK(false) << "CPU doesn't support SSE 4.2";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:125:3: note: (if you use 
‘-fpermissive’, G++ will accept your code, but allowing the use of an 
undeclared name is deprecated)
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘int 
arrow::SSE4_cmpestri(__m128i, int, __m128i, int)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:131:3: error: there are no 
arguments to ‘DCHECK’ that depend on a template parameter, so a declaration of 
‘DCHECK’ must be available [-fpermissive]
   DCHECK(false) << "CPU doesn't support SSE 4.2";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u8(uint32_t, uint8_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:136:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u16(uint32_t, uint16_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:141:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u32(uint32_t, uint32_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:146:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
   ^~
/root/repos/arrow/cpp/src/arrow/util/sse-util.h: In function ‘uint32_t 
arrow::SSE4_crc32_u64(uint32_t, uint64_t)’:
/root/repos/arrow/cpp/src/arrow/util/sse-util.h:151:3: error: ‘DCHECK’ was not 
declared in this scope
   DCHECK(false) << "SSE support is not enabled";
{code}

By importing `logging.h` or removing `DCHECK`, they can compile. The fix should 
be to refactor the SSE detection macro out of this file such that the needing 
code does not need to import this file and only a header with macro detection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4673) [C++] Implement AssertDatumEquals

2019-02-25 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4673:
-

 Summary: [C++] Implement AssertDatumEquals
 Key: ARROW-4673
 URL: https://issues.apache.org/jira/browse/ARROW-4673
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Aggregate tests could benefit from this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4694) [CI] detect-changes.py is inconsistent

2019-02-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4694:
-

 Summary: [CI] detect-changes.py is inconsistent
 Key: ARROW-4694
 URL: https://issues.apache.org/jira/browse/ARROW-4694
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Affects Versions: 0.12.1
Reporter: Francois Saint-Jacques


Some examples of pull-requests with wrong affected files:
   - [pr-3762|https://github.com/apache/arrow/pull/3762/files] shouldn't 
trigger [javascript|https://travis-ci.org/apache/arrow/jobs/498805479#L217]
   - [pr-3767|https://github.com/apache/arrow/pull/3767/files] shouldn't affect 
files found in [rust|https://travis-ci.org/apache/arrow/jobs/499122044] and 
[javascript|https://travis-ci.org/apache/arrow/jobs/499122041#L217]

In 
[get_travis_commit_range|https://github.com/apache/arrow/blob/master/ci/detect-changes.py#L63-L67]
 , it references the following 
[comment|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-139811122].
 If read further down in the 
[thread|https://github.com/travis-ci/travis-ci/issues/4596#issuecomment-434532772],
 you'll note that it can go bonkers due to shallowness and commit of branch 
creation. I'm not sure if this is the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4696) Verify release script is over optimist with CUDA detection

2019-02-27 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4696:
-

 Summary: Verify release script is over optimist with CUDA detection
 Key: ARROW-4696
 URL: https://issues.apache.org/jira/browse/ARROW-4696
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Francois Saint-Jacques


I have a Nvidia GPU without cuda, everytime I run the verification scripts it 
borks in the middle because ARROW_HAVE_CUDA is evaluated to yes because 
`nvidia-smi --list-gpus` returns true. This can be a long process if I forget 
about it.

Would it be better to check for `CUDA_HOME`?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4728) [Javascript] Failing test Table#assign with a zero-length Null column round-trips through serialization

2019-03-01 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4728:
-

 Summary: [Javascript] Failing test Table#assign with a zero-length 
Null column round-trips through serialization
 Key: ARROW-4728
 URL: https://issues.apache.org/jira/browse/ARROW-4728
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Affects Versions: 0.12.1
Reporter: Francois Saint-Jacques
 Fix For: 0.13.0


See https://travis-ci.org/apache/arrow/jobs/500414242#L1002
{code:javascript}
  ● Table#serialize() › Table#assign with an empty table round-trips through 
serialization
expect(received).toBe(expected) // Object.is equality
Expected: 86
Received: 41
  91 | const source = table1.assign(Table.empty());
  92 | expect(source.numCols).toBe(table1.numCols);
> 93 | expect(source.length).toBe(table1.length);
 |   ^
  94 | const result = Table.from(source.serialize());
  95 | expect(result).toEqualTable(source);
  96 | expect(result.schema.metadata.get('foo')).toEqual('bar');
  at Object.test (test/unit/table/serialize-tests.ts:93:35)
  ● Table#serialize() › Table#assign with a zero-length Null column round-trips 
through serialization
expect(received).toBe(expected) // Object.is equality
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4729) [C++] Improve buffer symbolic index

2019-03-01 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4729:
-

 Summary: [C++] Improve buffer symbolic index
 Key: ARROW-4729
 URL: https://issues.apache.org/jira/browse/ARROW-4729
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.12.1
Reporter: Francois Saint-Jacques


The array data `buffers` vector is index differently depending on the Array 
type. This feature would expose static constexpr named variables for buffer 
index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4765) [JAVA][Flight] Memory leak

2019-03-04 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4765:
-

 Summary: [JAVA][Flight] Memory leak
 Key: ARROW-4765
 URL: https://issues.apache.org/jira/browse/ARROW-4765
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Java
Affects Versions: 0.12.1
Reporter: Francois Saint-Jacques


There is a potential race issue when reclaiming the FlightServer.
{code:java}
[ERROR] ensureIndependentSteams(org.apache.arrow.flight.TestBackPressure) Time 
elapsed: 1.394 s <<< ERROR!
java.lang.IllegalStateException: 
Memory was leaked by query. Memory leaked: (131072)
Allocator(perf-server) 0/131072/589824/9223372036854775807 
(res/actual/peak/limit)
at 
org.apache.arrow.flight.TestBackPressure.ensureIndependentSteams(TestBackPressure.java:76)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4643) [C++] Add compiler diagnostic color when using Ninja

2019-02-20 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4643:
-

 Summary: [C++] Add compiler diagnostic color when using Ninja
 Key: ARROW-4643
 URL: https://issues.apache.org/jira/browse/ARROW-4643
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


Due to [ninja-ism|https://github.com/ninja-build/ninja/issues/174], this forces 
color of errors/warnings.

Very handy for C++.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4654) [C++] Implicit Flight target dependencies cause compilation failure

2019-02-21 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4654:
-

 Summary: [C++] Implicit Flight target dependencies cause 
compilation failure
 Key: ARROW-4654
 URL: https://issues.apache.org/jira/browse/ARROW-4654
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, FlightRPC
Affects Versions: 0.12.0
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques



{code:sh}
In file included from ../src/arrow/flight/internal.h:23:0,
 from ../src/arrow/python/flight.cc:20:
../src/arrow/flight/protocol-internal.h:22:10: fatal error: 
arrow/flight/Flight.grpc.pb.h: No such file or directory
 #include "arrow/flight/Flight.grpc.pb.h"  // IWYU pragma: export
  ^~
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4660) [C++] gflags fails to build due to CMake error

2019-02-22 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4660:
-

 Summary: [C++] gflags fails to build due to CMake error
 Key: ARROW-4660
 URL: https://issues.apache.org/jira/browse/ARROW-4660
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.13.0
Reporter: Francois Saint-Jacques


gflags fails to build as a thirdparty download on linux and cmake 3.10.2. 
Removing the line `target_compile_definitions(${GFLAGS_LIBRARY} INTERFACE 
"GFLAGS_IS_A_DLL=0")` makes it build without issue.
{code}
CMake Error at cmake_modules/ThirdpartyToolchain.cmake:658 
(target_compile_definitions):
Cannot specify compile definitions for imported target "gflags_static".
Call Stack (most recent call first):
CMakeLists.txt:506 (include)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4776) [C++] DictionaryBuilder should support bootstrapping from an existing dict type

2019-03-05 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4776:
-

 Summary: [C++] DictionaryBuilder should support bootstrapping from 
an existing dict type
 Key: ARROW-4776
 URL: https://issues.apache.org/jira/browse/ARROW-4776
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This would mean adding a new DictionaryBuilder constructor that receives a 
dictionary type and performs a lazy deep copy if there's any modification. 
We'll have to investigate how this translate in API ergonomics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4779) [CI] AppVeyor link failure

2019-03-05 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4779:
-

 Summary: [CI] AppVeyor link failure
 Key: ARROW-4779
 URL: https://issues.apache.org/jira/browse/ARROW-4779
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/22841788/job/i0bmixvlw67ty284#L671


{code:java}
  Version 14.00.24241.7
  ExceptionCode= C005
  ExceptionFlags   = 
  ExceptionAddress = 7FF78516AE57 (7FF78513) 
"C:\PROGRA~2\MI0E91~1.0\VC\bin\amd64\link.exe"
  NumberParameters = 0002
  ExceptionInformation[ 0] = 
  ExceptionInformation[ 1] = 000201EF7BF0
CONTEXT:
  Rax= 0011  R8 = 
  Rbx= 00CE87C812A0  R9 = 7FF78522EA30
  Rcx= 7FF78522EA30  R10= 
  Rdx= 0011  R11= 00CE8834F0C0
  Rsp= 00CE8834DC00  R12= 
  Rbp= 00CE8834DD00  E13= 
  Rsi=   R14= 0100
  Rdi= 000201EF7BF0  R15= 0001
  Rip= 7FF78516AE57  EFlags = 00010202
  SegCs  = 0033  SegDs  = 002B
  SegSs  = 002B  SegEs  = 002B
  SegFs  = 0053  SegGs  = 002B
  Dr0=   Dr3= 
  Dr1=   Dr6= 
  Dr2=   Dr7= 
LINK : fatal error LNK1000: unknown error at 7FF78516AE1A; consult 
documentation for technical support options
[189/282] Building CXX object 
src\arrow\CMakeFiles\arrow-scalar-test.dir\scalar-test.cc.obj
[190/282] Building CXX object 
src\arrow\CMakeFiles\arrow-public-api-test.dir\public-api-test.cc.obj
ninja: build stopped: subcommand failed.
C:\projects\arrow\cpp\build-debug>goto scriptexit 
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4962) [C++] Warning level to CHECKIN can't compile on modern GCC

2019-03-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4962:
-

 Summary: [C++] Warning level to CHECKIN can't compile on modern GCC
 Key: ARROW-4962
 URL: https://issues.apache.org/jira/browse/ARROW-4962
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.1
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.13.0


This is somewhat related to the recent DCHECK change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4838) [C++] Implement safe Make constructor

2019-03-12 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4838:
-

 Summary: [C++] Implement safe Make constructor
 Key: ARROW-4838
 URL: https://issues.apache.org/jira/browse/ARROW-4838
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0


The following classes need validating constructors:

* ArrayData
* ChunkedArray
* RecordBatch
* Column
* Table



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4990) [C++] Kernel to compare array with array

2019-03-21 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4990:
-

 Summary: [C++] Kernel to compare array with array
 Key: ARROW-4990
 URL: https://issues.apache.org/jira/browse/ARROW-4990
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4999) [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes

2019-03-22 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4999:
-

 Summary: [Doc] Add examples on how to construct with 
ArrayData::Make instead of builder classes
 Key: ARROW-4999
 URL: https://issues.apache.org/jira/browse/ARROW-4999
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4564) [C++] IWYU docker image silently fails

2019-02-13 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4564:
-

 Summary: [C++] IWYU docker image silently fails
 Key: ARROW-4564
 URL: https://issues.apache.org/jira/browse/ARROW-4564
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Affects Versions: 0.13.0
Reporter: Francois Saint-Jacques
 Fix For: 0.13.0


[ARROW-4528|https://issues.apache.org/jira/browse/ARROW-4528] silently removed 
`iwyu` from the list of installed packages. The `iwyu_tool.py` does _not_ 
propagate errors correctly if `iwyu` binary is not found. This seems to be 
resolved in more recent version, which will be addressed in 
[ARROW-4340|https://issues.apache.org/jira/browse/ARROW-4340].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4529) Add test coverage for BitUtils::RoundDown

2019-02-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4529:
-

 Summary: Add test coverage for BitUtils::RoundDown
 Key: ARROW-4529
 URL: https://issues.apache.org/jira/browse/ARROW-4529
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4531) Handling of non-aligned slices in Sum kernel

2019-02-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4531:
-

 Summary: Handling of non-aligned slices in Sum kernel
 Key: ARROW-4531
 URL: https://issues.apache.org/jira/browse/ARROW-4531
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


The Sum kernel does not support slices where the offset is not byte-aligned. 
Other kernels avoid this problem due to BitmapReader usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4530) Review Aggregate kernel state allocation/ownership semantics

2019-02-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4530:
-

 Summary: Review Aggregate kernel state allocation/ownership 
semantics
 Key: ARROW-4530
 URL: https://issues.apache.org/jira/browse/ARROW-4530
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4364) [C++] Fix -weverything -wextra compilation errors

2019-01-24 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4364:
-

 Summary: [C++] Fix -weverything -wextra compilation errors
 Key: ARROW-4364
 URL: https://issues.apache.org/jira/browse/ARROW-4364
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.12.0
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5171) [C++] Use LESS instead of LOWER in compare enum option.

2019-04-15 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5171:
-

 Summary: [C++] Use LESS instead of LOWER in compare enum option.
 Key: ARROW-5171
 URL: https://issues.apache.org/jira/browse/ARROW-5171
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


See https://github.com/apache/arrow/pull/3963#discussion_r275596603



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5175) [Benchmarking] Decide which benchmarks are part of regression checks

2019-04-16 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5175:
-

 Summary: [Benchmarking] Decide which benchmarks are part of 
regression checks
 Key: ARROW-5175
 URL: https://issues.apache.org/jira/browse/ARROW-5175
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5489) [C++

2019-06-03 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5489:
-

 Summary: [C++
 Key: ARROW-5489
 URL: https://issues.apache.org/jira/browse/ARROW-5489
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5464) [Archery] Bad --benchmark-filter default

2019-05-31 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5464:
-

 Summary: [Archery] Bad --benchmark-filter default
 Key: ARROW-5464
 URL: https://issues.apache.org/jira/browse/ARROW-5464
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5527) [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) for heap data

2019-06-07 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5527:
-

 Summary: [C++] HashTable/MemoTable should use Buffer(s)/Builder(s) 
for heap data
 Key: ARROW-5527
 URL: https://issues.apache.org/jira/browse/ARROW-5527
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The current implementation uses `std::vector` and `std::string` with unbounded 
size. The refactor would take a memory pool in the constructor for buffer 
management and would get rid of vectors.

This will have the side effect of propagating Status to some calls (notably 
insert due to Upsize failing to resize).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5530) [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null behavior

2019-06-07 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5530:
-

 Summary: [C++] Add options to ValueCount/Unique/DictEncode kernel 
to toggle null behavior
 Key: ARROW-5530
 URL: https://issues.apache.org/jira/browse/ARROW-5530
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5611) [C++] Improve clang-tidy speed

2019-06-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5611:
-

 Summary: [C++] Improve clang-tidy speed
 Key: ARROW-5611
 URL: https://issues.apache.org/jira/browse/ARROW-5611
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Francois Saint-Jacques


See https://github.com/apache/arrow/pull/4293#issuecomment-501950675



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5612) [Python][Documentation] Add prominent note that date_as_object option changed with Arrow 0.13

2019-06-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5612:
-

 Summary: [Python][Documentation] Add prominent note that 
date_as_object option changed with Arrow 0.13
 Key: ARROW-5612
 URL: https://issues.apache.org/jira/browse/ARROW-5612
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5652) [CI] Fix iwyu docker image

2019-06-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5652:
-

 Summary: [CI] Fix iwyu docker image
 Key: ARROW-5652
 URL: https://issues.apache.org/jira/browse/ARROW-5652
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


See 
[https://travis-ci.org/ursa-labs/crossbow/builds/547691665?utm_source=github_status_medium=notification]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5653) [CI] Fix cpp docker image

2019-06-19 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5653:
-

 Summary: [CI] Fix cpp docker image
 Key: ARROW-5653
 URL: https://issues.apache.org/jira/browse/ARROW-5653
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:shell}
make -f Makefile.docker run-cpp
...
54/64 Test #79: arrow-dataset-file_test ***Failed0.04 sec
Running arrow-dataset-file_test, redirecting output into 
/build/cpp/build/test-logs/arrow-dataset-file_test.txt (attempt 1/1)
/build/cpp/debug/arrow-dataset-file_test: error while loading shared libraries: 
libbrotlienc.so.1: cannot open shared object file: No such file or directory
/build/cpp/src/arrow/dataset

  Start 80: arrow-flight-test
55/64 Test #80: arrow-flight-test ..***Failed0.04 sec
Running arrow-flight-test, redirecting output into 
/build/cpp/build/test-logs/arrow-flight-test.txt (attempt 1/1)
/build/cpp/debug/arrow-flight-t
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5680) [Rust] Nightly tests are failing

2019-06-21 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5680:
-

 Summary: [Rust] Nightly tests are failing
 Key: ARROW-5680
 URL: https://issues.apache.org/jira/browse/ARROW-5680
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Francois Saint-Jacques


See 
https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link

once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further 
failures, e.g.


{code:bash}
running 18 tests
test csv_query_group_by_int_min_max ... FAILED
test csv_query_external_table_count ... ok
test csv_query_count ... ok
test csv_count_star ... ok
test csv_query_avg ... ok
test csv_query_avg_multi_batch ... ok
test csv_query_cast ... ok
test csv_query_group_by_avg ... FAILED
test csv_query_group_by_string_min_max ... FAILED
test csv_query_group_by_int_count ... FAILED
test csv_query_limit ... ok
test csv_query_limit_bigger_than_nbr_of_rows ... ok
test csv_query_limit_with_same_nbr_of_rows ... ok
test csv_query_cast_literal ... ok
test csv_query_limit_zero ... ok
test csv_query_create_external_table ... ok
test csv_query_with_predicate ... ok
test parquet_query ... ok

failures:

 csv_query_group_by_int_min_max stdout 
thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left 
== right)`
  left: 
`"4\t0.02182578039211991\t0.9237877978193884\n5\t0.0147930530301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
 right: 
`"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.0147930530301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`',
 datafusion/tests/sql.rs:77:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

 csv_query_group_by_avg stdout 
thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == right)`
  left: 
`"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
 right: 
`"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`',
 datafusion/tests/sql.rs:99:5

 csv_query_group_by_string_min_max stdout 
thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: 
`(left == right)`
  left: 
`"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
 right: 
`"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.0147930530301\t0.9965400387585364\n"`',
 datafusion/tests/sql.rs:187:5

 csv_query_group_by_int_count stdout 
thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left == 
right)`
  left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
 right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', 
datafusion/tests/sql.rs:175:5
{code}

I suspect that the tests are expecting the group-by results in a fix order. 
That would be highly dependent on the iterator of the hash table. Note that 
once I did a rustup update (and docker rmi rustlangrust/nightly), the failures 
have gone away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5544) [Archery] should not return non-zero in `benchmark diff` sub command on regression

2019-06-10 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5544:
-

 Summary: [Archery] should not return non-zero in `benchmark diff` 
sub command on regression
 Key: ARROW-5544
 URL: https://issues.apache.org/jira/browse/ARROW-5544
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


When a regression is detected, but the command ran successfully, it should 
return zero. Currently it returns the number of regression. This is to play 
better with ursabot. It should be left to the user to decide what to do with 
the json data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5269) [C++] Whitelist benchmarks candidates for regression checks

2019-05-06 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5269:
-

 Summary: [C++] Whitelist benchmarks candidates for regression 
checks
 Key: ARROW-5269
 URL: https://issues.apache.org/jira/browse/ARROW-5269
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 0.14.0


Rename all benchmarks candidate for regression with the `Regression` prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5251) [C++][Parquet] Bad initialization in statistics computation

2019-05-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5251:
-

 Summary: [C++][Parquet] Bad initialization in statistics 
computation
 Key: ARROW-5251
 URL: https://issues.apache.org/jira/browse/ARROW-5251
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The following lines are undefined if the first element is null.

https://github.com/apache/arrow/blob/250e97c70f497581bca412dfd2a654a1f9736064/cpp/src/parquet/statistics.cc#L159-L160



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5253) [C++] external Snappy fails on Alpine

2019-05-03 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5253:
-

 Summary: [C++] external Snappy fails on Alpine
 Key: ARROW-5253
 URL: https://issues.apache.org/jira/browse/ARROW-5253
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.13.0
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0



{code:bash}
FAILED: debug/libarrow.so.14.0.0 
: && /usr/bin/c++ -fPIC -Wno-noexcept-type  -fdiagnostics-color=always -ggdb 
-O0  -Wall -Wno-conversion -Wno-sign-conversion -Wno-unused-variable -Werror 
-msse4.2  -g  
-Wl,--version-script=/buildbot/amd64-alpine-3_9-cpp/cpp/src/arrow/symbols.map 
-shared -Wl,-soname,libarrow.so.14 -o debug/libarrow.so.14.0.0 
...
c++: error: snappy_ep/src/snappy_ep-install/lib/libsnappy.a: No such file or 
directory
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5739) [CI] Fix docker python build

2019-06-26 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5739:
-

 Summary: [CI] Fix docker python build
 Key: ARROW-5739
 URL: https://issues.apache.org/jira/browse/ARROW-5739
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


python docker image will fail to clean the build directory, installing a 
previous invocation of `docker-compose run python`. This is not affecting CI 
that drops the `/build` mount, but only local users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5779) [R][CI] R's docker image fails due to incompatibility

2019-06-28 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5779:
-

 Summary: [R][CI] R's docker image fails due to incompatibility
 Key: ARROW-5779
 URL: https://issues.apache.org/jira/browse/ARROW-5779
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques



{code:bash}
The downloaded source packages are in
'/tmp/RtmpLu0eiq/downloaded_packages'
v  checking for file 
'/tmp/RtmpLu0eiq/remotes1a8d7c759a55/romainfrancois-decor-6c5a5aa/DESCRIPTION' 
...
-  preparing 'decor':
v  checking DESCRIPTION meta-information ...
-  cleaning src
-  checking for LF line-endings in source and make files and shell scripts
-  checking for empty or unneeded directories
-  building 'decor_0.0.0.9001.tar.gz'
   
Installing package into '/usr/local/lib/R/site-library'
(as 'lib' is unspecified)
ERROR: this R is version 3.4.4, package 'decor' requires R >= 3.5.0
Error: Failed to install 'decor' from GitHub:
  (converted from warning) installation of package 
'/tmp/RtmpLu0eiq/file1a8d6986708c/decor_0.0.0.9001.tar.gz' had non-zero exit 
status
Execution halted
ERROR: Service 'r' failed to build: The command '/bin/sh -c Rscript -e 
"install.packages('devtools', repos = 'http://cran.rstudio.com')" && 
Rscript -e "devtools::install_github('romainfrancois/decor')" && Rscript -e 
"install.packages(c( 'Rcpp', 'dplyr', 'stringr', 'glue', 'vctrs',   
  'purrr', 'assertthat', 'fs', 'tibble', 
'crayon', 'testthat', 'bit64', 'hms', 
'lubridate'), repos = 'https://cran.rstudio.com')"' returned a non-zero 
code: 1
Makefile.docker:49: recipe for target 'build-r' failed

{code}

I'm not sure if the fix is just to bump R's version in the image, or avoid the 
failing package. cc [~romainfrancois]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5914) [CI] Build bundled dependencies in docker build step

2019-07-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5914:
-

 Summary: [CI] Build bundled dependencies in docker build step
 Key: ARROW-5914
 URL: https://issues.apache.org/jira/browse/ARROW-5914
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


In the recently introduced ARROW-5803, some heavy dependencies (thrift, 
protobuf, flatbufers, grpc) are build at each invocation of docker-compose 
build (thus each travis test).

We should aim to build the third party dependencies in docker build phase 
instead, to exploit caching and docker-compose pull so that the CI step doesn't 
need to build said dependencies each time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5923) [C++] Fix int96 comment

2019-07-12 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5923:
-

 Summary: [C++] Fix int96 comment
 Key: ARROW-5923
 URL: https://issues.apache.org/jira/browse/ARROW-5923
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5202) [C++

2019-04-23 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5202:
-

 Summary: [C++
 Key: ARROW-5202
 URL: https://issues.apache.org/jira/browse/ARROW-5202
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5781) [Archery] Ensure benchmark clone accepts remotes in revision

2019-06-28 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5781:
-

 Summary: [Archery] Ensure benchmark clone accepts remotes in 
revision
 Key: ARROW-5781
 URL: https://issues.apache.org/jira/browse/ARROW-5781
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Affects Versions: 0.13.0
Reporter: Francois Saint-Jacques


Found that ursabot would always compare the PR tip commit with itself via 
https://github.com/apache/arrow/pull/4739#issuecomment-506819250 . This is due 
to buildbot github behavior of using a git-reset --hard local that changes the 
`master` rev to this new state. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-6123) [C++] IsIn kernel should not materialize the output internal

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6123:
-

 Summary: [C++] IsIn kernel should not materialize the output 
internal
 Key: ARROW-6123
 URL: https://issues.apache.org/jira/browse/ARROW-6123
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


It should use the helpers since the output size is known.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6121) [Tools] Improve merge tool cli ergonomic

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6121:
-

 Summary: [Tools] Improve merge tool cli ergonomic
 Key: ARROW-6121
 URL: https://issues.apache.org/jira/browse/ARROW-6121
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


* Accepts the pull-request number as an optional (first) parameter to the script
* Supports reading the jira username/password from a file



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6122) [C++] IsIn kernel must support FixedSizeBinary

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6122:
-

 Summary: [C++] IsIn kernel must support FixedSizeBinary
 Key: ARROW-6122
 URL: https://issues.apache.org/jira/browse/ARROW-6122
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.15.0
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6124) [C++] IsIn kernel should sort in a single pass (with nulls)

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6124:
-

 Summary: [C++] IsIn kernel should sort in a single pass (with 
nulls)
 Key: ARROW-6124
 URL: https://issues.apache.org/jira/browse/ARROW-6124
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.15.0
Reporter: Francois Saint-Jacques


There's a good chance that merge sort must be implemented (spill to disk, 
ChunkedArray, ...)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6244) [C++] Implement Partition DataSource

2019-08-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6244:
-

 Summary: [C++] Implement Partition DataSource
 Key: ARROW-6244
 URL: https://issues.apache.org/jira/browse/ARROW-6244
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


This is a DataSource that also has partition metadata. The end goal is to 
support filtering with a DataSelector/Filter expression. The initial 
implementation should not deal with PartitionScheme yet.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6242) [C++] Implements basic Dataset/Scanner/ScannerBuilder

2019-08-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6242:
-

 Summary: [C++] Implements basic Dataset/Scanner/ScannerBuilder
 Key: ARROW-6242
 URL: https://issues.apache.org/jira/browse/ARROW-6242
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


The goal of this would be to iterate over a Dataset and generate a "flattened" 
stream of RecordBatches from the union of data sources and data fragments. This 
should not bother with filtering yet.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6396) [C++] Add CompareOptions to Compare kernels

2019-08-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6396:
-

 Summary: [C++] Add CompareOptions to Compare kernels
 Key: ARROW-6396
 URL: https://issues.apache.org/jira/browse/ARROW-6396
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


This would add an enum ResolveNull \{ KLEENE_LOGIC, NULL_PROPAGATE }.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6397) [C++][CI] Fix S3 minio failure

2019-08-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6397:
-

 Summary: [C++][CI] Fix S3 minio failure
 Key: ARROW-6397
 URL: https://issues.apache.org/jira/browse/ARROW-6397
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Continuous Integration
Reporter: Francois Saint-Jacques


See 
[https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27065941/job/gwjmr2hudm7693ef]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6448) [CI] Add crossbow notifications

2019-09-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6448:
-

 Summary: [CI] Add crossbow notifications
 Key: ARROW-6448
 URL: https://issues.apache.org/jira/browse/ARROW-6448
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6378) [C++][Dataset] Implement TreeDataSource

2019-08-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6378:
-

 Summary: [C++][Dataset] Implement TreeDataSource
 Key: ARROW-6378
 URL: https://issues.apache.org/jira/browse/ARROW-6378
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


The TreeDataSource is required to support partitions pruning of sub-trees.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6341) [Python] Implements low-level bindings to Dataset classes:

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6341:
-

 Summary: [Python] Implements low-level bindings to Dataset classes:
 Key: ARROW-6341
 URL: https://issues.apache.org/jira/browse/ARROW-6341
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Francois Saint-Jacques


The following classes should be accessible from Python:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6340) [R] Implements low-level bindings to Dataset classes

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6340:
-

 Summary: [R] Implements low-level bindings to Dataset classes
 Key: ARROW-6340
 URL: https://issues.apache.org/jira/browse/ARROW-6340
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Francois Saint-Jacques


The following classes should be accessible from R:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6476) [Java][CI] Travis java all-jdks job is broken

2019-09-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6476:
-

 Summary: [Java][CI] Travis java all-jdks job is broken
 Key: ARROW-6476
 URL: https://issues.apache.org/jira/browse/ARROW-6476
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


Introduced by ARROW-6433, fixing the shade check enabled evaluation of the 
incorrect body. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6605) [C++] Add recursion depth control to fs::Selector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6605:
-

 Summary: [C++] Add recursion depth control to fs::Selector
 Key: ARROW-6605
 URL: https://issues.apache.org/jira/browse/ARROW-6605
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This is similar to the recursive options, but also control the depth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6606) [C++] Construct tree structure from std::vector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6606:
-

 Summary: [C++] Construct tree structure from 
std::vector
 Key: ARROW-6606
 URL: https://issues.apache.org/jira/browse/ARROW-6606
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


This will be used by FileSystemDataSource for pushdown predicate pruning of 
branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6615) [C++] Add filtering option to fs::Selector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6615:
-

 Summary: [C++] Add filtering option to fs::Selector
 Key: ARROW-6615
 URL: https://issues.apache.org/jira/browse/ARROW-6615
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


It would convenient if Selector could support file path filtering, either via a 
regex or globbing applied to the path.

This is semi required for filtering file in Dataset to properly apply the file 
format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6614:
-

 Summary: [C++][Dataset] Implement FileSystemDataSourceDiscovery
 Key: ARROW-6614
 URL: https://issues.apache.org/jira/browse/ARROW-6614
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


DataSourceDiscovery is what allows InferingSchema and constructing a DataSource 
with PartitionScheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6161) [C++] Implements dataset::ParquetFile and associated Scan structures

2019-08-07 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6161:
-

 Summary: [C++] Implements dataset::ParquetFile and associated Scan 
structures
 Key: ARROW-6161
 URL: https://issues.apache.org/jira/browse/ARROW-6161
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6148) Missing debian build dependencies

2019-08-06 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6148:
-

 Summary: Missing debian build dependencies
 Key: ARROW-6148
 URL: https://issues.apache.org/jira/browse/ARROW-6148
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6730) [CI] Use Github Actions for "C++ with clang 7" docker image

2019-09-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6730:
-

 Summary: [CI] Use Github Actions for "C++ with clang 7" docker 
image
 Key: ARROW-6730
 URL: https://issues.apache.org/jira/browse/ARROW-6730
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6769) [C++][Dataset] End to End dataset integration test case

2019-10-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6769:
-

 Summary: [C++][Dataset] End to End dataset integration test case
 Key: ARROW-6769
 URL: https://issues.apache.org/jira/browse/ARROW-6769
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


1. Create a DataSource from a known directory and a PartitionScheme. 
2. Create a Dataset from the previous DataSource. 
3. Request a ScannerBuilder from previous Dataset. 
4. Add filter expression to ScannerBuilder (and other options). 
5. Finalize into a Scan operation. 
6. Materialize into an arrow::Table.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7148) [C++][Dataset] API cleanup

2019-11-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7148:
-

 Summary: [C++][Dataset] API cleanup
 Key: ARROW-7148
 URL: https://issues.apache.org/jira/browse/ARROW-7148
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7178) [C++] Vendor forward compatible std::optional

2019-11-15 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7178:
-

 Summary: [C++] Vendor forward compatible std::optional
 Key: ARROW-7178
 URL: https://issues.apache.org/jira/browse/ARROW-7178
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Having std::optional was mentioned a few time, [~emkornfi...@gmail.com] 
suggested https://github.com/martinmoene/optional-lite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7079) [C++][Dataset] Implement ScalarAsStatisctics for non-primitive types

2019-11-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7079:
-

 Summary: [C++][Dataset] Implement ScalarAsStatisctics for 
non-primitive types
 Key: ARROW-7079
 URL: https://issues.apache.org/jira/browse/ARROW-7079
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


Statistics are not extracted for the following (parquet) types

- BYTE_ARRAY
- FLBA
- Any logical timestamps/dates



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7360) [R] Can't use dataset's filter with non-literal expression

2019-12-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7360:
-

 Summary: [R] Can't use dataset's filter with non-literal expression
 Key: ARROW-7360
 URL: https://issues.apache.org/jira/browse/ARROW-7360
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques


The following will generate an error


{code:r}
test_that("filtering with expression", {
  char_sym <- "b"   
  expect_dplyr_equal(   
input %>%   
  filter(chr == char_sym) %>%   
  select(string = chr, int) %>% 
  collect(),
tbl 
  ) 
})  

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project

2019-12-13 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7390:
-

 Summary: [C++][Dataset] Concurrency race in Projector::Project 
 Key: ARROW-7390
 URL: https://issues.apache.org/jira/browse/ARROW-7390
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


When a DataFragment is invoked by 2 scan tasks of the same DataFragment, 
there's a race to invoke SetInputSchema. Note that ResizeMissingColumns also 
suffers from this race. The ideal goal is to make Project a const method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7380) [C++][Dataset] Implement DatasetDiscovery

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7380:
-

 Summary: [C++][Dataset] Implement DatasetDiscovery
 Key: ARROW-7380
 URL: https://issues.apache.org/jira/browse/ARROW-7380
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


Takes a list of DataSourceDiscovery and yields a Dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7379) [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7379:
-

 Summary: [C++] Introduce Field::CompatiblesWith and 
Schema::CompatiblesWith
 Key: ARROW-7379
 URL: https://issues.apache.org/jira/browse/ARROW-7379
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


The methods verifies if fields/schemas are compatible with regards to naming 
and type. This is a partly extracted from `UnifySchemas`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7338) [C++] Rename SimpleDataSource to InMemoryDataSource

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7338:
-

 Summary: [C++] Rename SimpleDataSource to InMemoryDataSource
 Key: ARROW-7338
 URL: https://issues.apache.org/jira/browse/ARROW-7338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


The constructor should take a generator

{code:c++}
// Some comments here
class InMemoryDataSource : public DataSource {
  public:
using Generator = std::function>;

InMemoryDataSource(Generator&& generator);
// Convenience constructor to support a fixed list of RecordBatch
InMemoryDataSource(std::shared_ptr);
InMemoryDataSource(std::vector>);

private:
  Generator generator;
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7339) [CMake] Thrift version not respected in CMake configuration version.txt

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7339:
-

 Summary: [CMake] Thrift version not respected in CMake 
configuration version.txt
 Key: ARROW-7339
 URL: https://issues.apache.org/jira/browse/ARROW-7339
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


If thrift is requested via BUNBLED, thrift 0.9.1 will be downloaded instead of 
the requested version. This is due to FindThrift.cmake overriding 
THRIFT_VERSION from the locally installed thrift compiler (0.9.1. on ubuntu 
18.04).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7007) [C++] Enable mmap option for LocalFs

2019-10-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7007:
-

 Summary: [C++] Enable mmap option for LocalFs
 Key: ARROW-7007
 URL: https://issues.apache.org/jira/browse/ARROW-7007
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6953) [C++][Dataset] Implement Gandiva Filter/Projector in Scanner

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6953:
-

 Summary: [C++][Dataset] Implement Gandiva Filter/Projector in 
Scanner
 Key: ARROW-6953
 URL: https://issues.apache.org/jira/browse/ARROW-6953
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Currently, we have `RecordBatchProjector` and `ExpressionEvaluator` to achieve 
this feature. This would implement a single class that fuse both and uses 
gandiva. This would be exposed in the ScannerBuilder via an option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6956) [C++] Status should use unique_ptr

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6956:
-

 Summary: [C++] Status should use unique_ptr
 Key: ARROW-6956
 URL: https://issues.apache.org/jira/browse/ARROW-6956
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The logic of Status::State is _very_  similar to unique_ptr except the deep 
copy on copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6950) [C++][Dataset] Add example/benchmark for reading parquet files with dataset

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6950:
-

 Summary: [C++][Dataset] Add example/benchmark for reading parquet 
files with dataset
 Key: ARROW-6950
 URL: https://issues.apache.org/jira/browse/ARROW-6950
 Project: Apache Arrow
  Issue Type: Test
  Components: C++
Reporter: Francois Saint-Jacques


Create an executable that load a directory with a known partition scheme with a 
filter and a projection. This will be used as a baseline for future performance 
improvement but also to show various feature of the dataset API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6951) [C++][Dataset] Ensure column projection is passed to ParquetDataFragment

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6951:
-

 Summary: [C++][Dataset] Ensure column projection is passed to 
ParquetDataFragment
 Key: ARROW-6951
 URL: https://issues.apache.org/jira/browse/ARROW-6951
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6952) [C++][Dataset] Ensure expression filter is passed ParquetDataFragment

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6952:
-

 Summary: [C++][Dataset] Ensure expression filter is passed 
ParquetDataFragment
 Key: ARROW-6952
 URL: https://issues.apache.org/jira/browse/ARROW-6952
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


We should be able to prune RowGroups based on the expression and the statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6902) [C++] Add String*/Binary* support for Compare kernels

2019-10-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6902:
-

 Summary: [C++] Add String*/Binary* support for Compare kernels
 Key: ARROW-6902
 URL: https://issues.apache.org/jira/browse/ARROW-6902
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6987) [CI] Travis OSX failing to install sdk headers

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6987:
-

 Summary: [CI] Travis OSX failing to install sdk headers
 Key: ARROW-6987
 URL: https://issues.apache.org/jira/browse/ARROW-6987
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


{code:java}
sudo installer -pkg 
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
 -target /343installer: Package name is 
macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign package 
is not trusted. Use -allowUntrusted to override.345The command 
"$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" 
failed and exited with 1 during .
{code}
See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6988:
-

 Summary: [CI][R] Buildbot's R Conda is failing
 Key: ARROW-6988
 URL: https://issues.apache.org/jira/browse/ARROW-6988
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:java}
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled && 
!is.null(test)) {skip_empty()}}, expectation = handle_expectation, 
skip = handle_skip, warning = handle_warning, message = handle_message, 
error = handle_error), error = handle_fatal, skip = function(e) {})
  26: test_code(NULL, exprs, env)
  27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
  28: force(code)
  29: with_reporter(reporter = reporter, start_end_reporter = 
start_end_reporter, {reporter$start_file(basename(path))
lister$start_file(basename(path))source_file(path, new.env(parent = 
env), chdir = TRUE, wrap = wrap)reporter$.end_context() 
   reporter$end_file()})
  30: FUN(X[[i]], ...)
  31: lapply(paths, test_file, env = env, reporter = current_reporter, 
start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
  32: force(code)
  33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE,  
   load_helpers = FALSE, wrap = wrap))
  34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
  35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
stop_on_warning, wrap = wrap)
  36: test_package_dir(package = package, test_path = test_path, filter = 
filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
stop_on_warning = stop_on_warning, wrap = wrap)
  37: test_check("arrow")
  An irrecoverable exception occurred. R is aborting now ...
  Segmentation fault (core dumped)
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: 1 ERROR, 1 WARNING, 2 NOTEs
See
  ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
for details.
 {code}
[|https://ci.ursalabs.org/#/builders/95] 
[https://ci.ursalabs.org/#/builders/95/builds/2386] 
[https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types

2019-10-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7017:
-

 Summary: [C++] Refactor AddKernel to support other operations and 
types
 Key: ARROW-7017
 URL: https://issues.apache.org/jira/browse/ARROW-7017
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Compute
Reporter: Francois Saint-Jacques


* Should avoid using builders (and/or NULLs) since the output shape is known a 
compute time.
 * Should be refatored to support other operations, e.g. Substraction, 
Multiplication.
 * Should have a overflow, underflow detection mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6964) [C++][Dataset] Expose a nested parellel option for Scanner

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6964:
-

 Summary: [C++][Dataset] Expose a nested parellel option for Scanner
 Key: ARROW-6964
 URL: https://issues.apache.org/jira/browse/ARROW-6964
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6965) [C++][Dataset] Optionally expose partition keys as materialized columns

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6965:
-

 Summary: [C++][Dataset] Optionally expose partition keys as 
materialized columns
 Key: ARROW-6965
 URL: https://issues.apache.org/jira/browse/ARROW-6965
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6969:
-

 Summary: [C++][Dataset] ParquetScanTask eagerly load file 
 Key: ARROW-6969
 URL: https://issues.apache.org/jira/browse/ARROW-6969
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The file content should only be read when invoking ParquetScanTask::Scan, not 
on construction. This blocks reading in a true streaming fashion with memory 
constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7210) [C++] Scalar cast should support time-based types

2019-11-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7210:
-

 Summary: [C++] Scalar cast should support time-based types
 Key: ARROW-7210
 URL: https://issues.apache.org/jira/browse/ARROW-7210
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This would allow supporting a minimum of expression evaluation on time-based 
arrays.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7265) [Format][C++] Clarify the usage of typeIds in Union type documentation

2019-11-26 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7265:
-

 Summary: [Format][C++] Clarify the usage of typeIds in Union type 
documentation
 Key: ARROW-7265
 URL: https://issues.apache.org/jira/browse/ARROW-7265
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The documentation is unclear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >