[jira] [Created] (ARROW-9108) [C++][Dataset] Add Parquet Statistics conversion for timestamp columns

2020-06-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9108:
-

 Summary: [C++][Dataset] Add Parquet Statistics conversion for 
timestamp columns
 Key: ARROW-9108
 URL: https://issues.apache.org/jira/browse/ARROW-9108
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9107) [C++][Dataset] Time-based types support

2020-06-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9107:
-

 Summary: [C++][Dataset] Time-based types support
 Key: ARROW-9107
 URL: https://issues.apache.org/jira/browse/ARROW-9107
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


We lack the support of date/timestamp partitions, and predicate pushdown rules. 
Timestamp columns are usually the most important predicate in OLAP style 
queries, we need to support this transparently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9068) [C++][Dataset] Simplify Partitioning interface

2020-06-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9068:
-

 Summary: [C++][Dataset] Simplify Partitioning interface
 Key: ARROW-9068
 URL: https://issues.apache.org/jira/browse/ARROW-9068
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques


The `int segment` of `Partitioning::Parse` should not be exposed to the user. 
KeyValuePartiioning should be a private Impl interface, not in public headers. 

The same apply to `Partitioning::Format`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9028) [R] Should be able to convert an empty table

2020-06-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-9028:
-

 Summary: [R] Should be able to convert an empty table
 Key: ARROW-9028
 URL: https://issues.apache.org/jira/browse/ARROW-9028
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8997) [Archery] Benchmark formatter should have friendly units

2020-06-01 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8997:
-

 Summary: [Archery] Benchmark formatter should have friendly units
 Key: ARROW-8997
 URL: https://issues.apache.org/jira/browse/ARROW-8997
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


The current output is not friendly to glance at. Usage of humanfriendly can 
help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8986) [Archery][ursabot] Fix benchmark diff checkout of origin/master

2020-05-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8986:
-

 Summary: [Archery][ursabot] Fix benchmark diff checkout of 
origin/master
 Key: ARROW-8986
 URL: https://issues.apache.org/jira/browse/ARROW-8986
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


https://github.com/apache/arrow/pull/7300#issuecomment-635967095



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8890) [R] Fix C++ lint issue

2020-05-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8890:
-

 Summary: [R] Fix C++ lint issue 
 Key: ARROW-8890
 URL: https://issues.apache.org/jira/browse/ARROW-8890
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8884) [C++] Listing files with S3FileSystem is slow

2020-05-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8884:
-

 Summary: [C++] Listing files with S3FileSystem is slow
 Key: ARROW-8884
 URL: https://issues.apache.org/jira/browse/ARROW-8884
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Listing files on S3 is slow due to the recursive nature of the algorithm.

The following change modifies the behavior of the S3Result to include all 
objects but no "grouping" (directories). This lower dramatically the number of 
HTTP calls. 
{code:c++}
diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
index 70c87f46ec..98a40b17a2 100644
--- a/cpp/src/arrow/filesystem/s3fs.cc
+++ b/cpp/src/arrow/filesystem/s3fs.cc
@@ -986,7 +986,7 @@ class S3FileSystem::Impl {
 if (!prefix.empty()) {
   req.SetPrefix(ToAwsString(prefix) + kSep);
 }
-req.SetDelimiter(Aws::String() + kSep);
+// req.SetDelimiter(Aws::String() + kSep);
 req.SetMaxKeys(kListObjectsMaxKeys);
 
 while (true) {

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8874) [C++][Dataset] Scanner::ToTable race when ScanTask exit early with an error

2020-05-20 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8874:
-

 Summary: [C++][Dataset] Scanner::ToTable race when ScanTask exit 
early with an error
 Key: ARROW-8874
 URL: https://issues.apache.org/jira/browse/ARROW-8874
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


https://github.com/apache/arrow/pull/7180#issuecomment-631059751

The issue is when 
[Finish|https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/scanner.cc#L184-L208]
 exit early due to a ScanTask error, in-flight tasks may try to lock the 
out-of-scope mutex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8720) [C++] Fix checked_pointer_cast

2020-05-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8720:
-

 Summary: [C++] Fix checked_pointer_cast
 Key: ARROW-8720
 URL: https://issues.apache.org/jira/browse/ARROW-8720
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


While investigating performance, I noted that dyncast (and rtti internal 
methods) were showing up in the "hot" functions for release builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8604) [R] Windows compilation failure

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8604:
-

 Summary: [R] Windows compilation failure
 Key: ARROW-8604
 URL: https://issues.apache.org/jira/browse/ARROW-8604
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8603) [Documentation] Fix Sphinx doxygen comment

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8603:
-

 Summary: [Documentation] Fix Sphinx doxygen comment
 Key: ARROW-8603
 URL: https://issues.apache.org/jira/browse/ARROW-8603
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Documentation
Reporter: Francois Saint-Jacques


See [https://github.com/apache/arrow/runs/622393532]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8602) [CMake] Fix ws2_32 link issue when cross-compiling on Linux

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8602:
-

 Summary: [CMake] Fix ws2_32 link issue when cross-compiling on 
Linux
 Key: ARROW-8602
 URL: https://issues.apache.org/jira/browse/ARROW-8602
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8601) [Go][Flight] Implement Flight Writer interface

2020-04-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8601:
-

 Summary: [Go][Flight] Implement Flight Writer interface
 Key: ARROW-8601
 URL: https://issues.apache.org/jira/browse/ARROW-8601
 Project: Apache Arrow
  Issue Type: Improvement
  Components: FlightRPC, Go
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8497) [Archery] Add missing component to builds

2020-04-17 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8497:
-

 Summary: [Archery] Add missing component to builds
 Key: ARROW-8497
 URL: https://issues.apache.org/jira/browse/ARROW-8497
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery, Developer Tools
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8488) [R] Replace VALUE_OR_STOP with ValueOrStop

2020-04-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8488:
-

 Summary: [R] Replace VALUE_OR_STOP with ValueOrStop
 Key: ARROW-8488
 URL: https://issues.apache.org/jira/browse/ARROW-8488
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


We should avoid macro as much as possible as per style guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8448) [Package] Can't build apt packages with ubuntu-focal

2020-04-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8448:
-

 Summary: [Package] Can't build apt packages with ubuntu-focal
 Key: ARROW-8448
 URL: https://issues.apache.org/jira/browse/ARROW-8448
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Francois Saint-Jacques
Assignee: Kouhei Sutou


While trying to debug the failing nightly (due to disk space), I encounter the 
following error, the tar generated by the build script does not conform to what 
debuilder expects. It blocks
{code}
Unable to find source-code formatter for language: shell. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yamlSuccessfully built ecdda7ea015d
Successfully tagged apache-arrow-ubuntu-focal:latest
docker run --rm --tty --volume 
/home/fsaintjacques/src/db/arrow/dev/tasks/linux-packages/apache-arrow/apt:/host:rw
 --env DEBUG=yes apache-arrow-ubuntu-focal /host/build.sh
This package has a Debian revision number but there does not seem to be
an appropriate original tar file or .orig directory in the parent directory;
(expected one of apache-arrow_0.16.0.orig.tar.gz, 
apache-arrow_0.16.0.orig.tar.bz2,
apache-arrow_0.16.0.orig.tar.lzma,  apache-arrow_0.16.0.orig.tar.xz or 
apache-arrow-1.0.0~dev20200414.orig)
continue anyway? (y/n) 

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8447) [C++][Dataset] Ensure Scanner::ToTable preserve ordering

2020-04-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8447:
-

 Summary: [C++][Dataset] Ensure Scanner::ToTable preserve ordering
 Key: ARROW-8447
 URL: https://issues.apache.org/jira/browse/ARROW-8447
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This can be refactored with a little effort in Scanner::ToTable:

# Change `batches` to `std::vector`
# When pushing the closure to the TaskGroup, also track an incrementing 
integer, e.g. scan_task_id
# In the closure, store the RecordBatches for this ScanTask in a local vector, 
when all batches are consumed, move the local vector in the `batches` at the 
right index, resizing and emplacing with mutex
# After waiting for the task group completion either
* Concatenate into a single vector and call `Table::FromRecordBatch` or
* Write a RecordBatchReader that supports vector and add 
method `Table::FromRecordBatchReader`

The later involves more work but is the clean way, the other FromRecordBatch 
method can be implemented from it and support "streaming".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8382) [C++][Dataset] Refactor WritePlan to decouple from Fragment/Scan/Partition classes

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8382:
-

 Summary: [C++][Dataset] Refactor WritePlan to decouple from 
Fragment/Scan/Partition classes 
 Key: ARROW-8382
 URL: https://issues.apache.org/jira/browse/ARROW-8382
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


WritePlan should look like the following. 

{code:c++}
class ARROW_DS_EXPORT WritePlan {
 public:
  /// Execute the WritePlan and return a FileSystemDataset as a result.
 Result Execute();

 protected:
  /// The schema of the Dataset which will be written
  std::shared_ptr schema;

  /// The format into which fragments will be written
  std::shared_ptr format;
 
  using SourceAndReader = std::pair;
  /// 
  std::vector outputs;
};
{code}

* Refactor FileFormat::Write(FileSource destination, RecordBatchReader), not 
sure if it should take the output schema, or the RecordBatchReader should be 
already of the right schema.
* Add a class/function that constructs SourceAndReader from Fragments, 
Partitioning and base path. And remove any Write/Fragment logic from 
partition.cc.
* Move Write() out FIleSystemDataset into WritePlan. It could take a 
FileSystemDatasetFactory to recreate the FileSystemDataset. This is a bonus, 
not a requirement.
* Simplify writing routine to avoid the PathTree directory structure, it 
shouldn't be more complex than `for task in write_tasks: task()`. Not path 
construction should there.

The effects are:
* Simplified WritePlan execution, abstracted away from path construction, and 
can write to multiple FileSystem and/or Buffers since it doesn't construct the 
FileSource.
* By the virtue of using RecordBatchReader instead of Fragment, it isn't tied 
to writing from Fragment, it can take any construct that yields a 
RecordBatchReader. It also means that WritePlan doesn't have to know about any 
Scan related classes.
* Writing can be done with or without partitioning, this logic is given to 
whomever generates the SourceAndReader list.
* Should be simpler to test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8381) [C++][Dataset] Dataset writing should require a writer schema

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8381:
-

 Summary: [C++][Dataset] Dataset writing should require a writer 
schema
 Key: ARROW-8381
 URL: https://issues.apache.org/jira/browse/ARROW-8381
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


# Dataset writing should always take an explicit writer schema instead of the 
first fragment's schema.
# The MakeWritePlanImpl should not try removing columns that are found in the 
partition, this is left to the caller by passing an explicit schema.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8374) [R] Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-04-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8374:
-

 Summary: [R] Table to vector of DictonaryType will error when 
Arrays don't have the same Dictionary per array
 Key: ARROW-8374
 URL: https://issues.apache.org/jira/browse/ARROW-8374
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


The conversion should accommodate Unifying the dictionary before converting, 
otherwise the indices are simply broken



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8348) [C++] Support optional sentinel values in primitive Array for nulls

2020-04-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8348:
-

 Summary: [C++] Support optional sentinel values in primitive Array 
for nulls
 Key: ARROW-8348
 URL: https://issues.apache.org/jira/browse/ARROW-8348
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


This is an optional feature where a sentinel value is stored in null cells and 
is exposed via an accessor method, e.g. `optional Array::HasSentinel() 
const;`. This would allow zero-copy bi-directional conversion with R.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment

2020-04-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8318:
-

 Summary: [C++][Dataset] Dataset should instantiate Fragment
 Key: ARROW-8318
 URL: https://issues.apache.org/jira/browse/ARROW-8318
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


Fragments are created on the fly when invoking a Scan. This means that a lot of 
the auxilliary/ancilliary data must be stored by the specialised Dataset, e.g. 
the FileSystemDataset must hold the path and partition expression. With the 
venue of more complex Fragment, e.g. ParquetFileFragment, more data must be 
stored. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8065) [C++][Dataset] Untangle Dataset, Fragment and ScanOptions

2020-03-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8065:
-

 Summary: [C++][Dataset] Untangle Dataset, Fragment and ScanOptions
 Key: ARROW-8065
 URL: https://issues.apache.org/jira/browse/ARROW-8065
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


We should be able to list fragments without going through the 
Scanner/ScanOptions hoops. This exposes a flaw with the current API where it 
require a ScanOptions to create Fragment, this is also a problem for 
ARROW-7824, i.e. why do we need a ScanOptions (read manifest) to write record 
batches to a given path.
 # Remove {{ScanOptions}} from Fragment's properties and move it into 
{{Fragment::Scan}} parameters.
 # Remove {{ScanOptions}} from {{Dataset::GetFragments}}, if required, we can 
still provide an alternate signature, e.g. 
{{Dataset::GetFragments(std::shared_ptr predicate)}} for sub-tree 
pruning in FileSystemDataset.
 # Fragment constructor should take a schema (and store it as a property), 
usually extracted from the Dataset schema. Update the schema() method 
accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7917) [CMake] FindPythonInterp should check for python3

2020-02-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7917:
-

 Summary: [CMake] FindPythonInterp should check for python3
 Key: ARROW-7917
 URL: https://issues.apache.org/jira/browse/ARROW-7917
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.16.0
Reporter: Francois Saint-Jacques


On ubuntu 18.04 it'll pick python2 by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7878) [C++] Implement LogicalPlan and LogicalPlanBuilder

2020-02-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7878:
-

 Summary: [C++] Implement LogicalPlan and LogicalPlanBuilder
 Key: ARROW-7878
 URL: https://issues.apache.org/jira/browse/ARROW-7878
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++ - Compute
Affects Versions: 1.0.0
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7861) [C++][Parquet] Add fuzz regression corpus for parquet reader

2020-02-14 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7861:
-

 Summary: [C++][Parquet] Add fuzz regression corpus for parquet 
reader
 Key: ARROW-7861
 URL: https://issues.apache.org/jira/browse/ARROW-7861
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7821) [Gandiva] Add support for literal variables

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7821:
-

 Summary: [Gandiva] Add support for literal variables
 Key: ARROW-7821
 URL: https://issues.apache.org/jira/browse/ARROW-7821
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++ - Gandiva
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


Gandiva supports static literal constants, but doesn't support runtime literal 
constants (or simply, variables). This means that queries like `x > 1` and `x > 
2` are compiled in separate operators. The goal would be to provide something 
like prepared statement for very simple expression, e.g. ` x > ?`. This way we 
can pre-generate operators for most basic comparison filters on every type.

I'm thinking that the variables should be stashed in the context pointer as 
opposed to a new function parameter. This would minimise the implementation 
impact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7820) [C++][Gandiva] Add CMake support for compiling LLVM's IR into a library

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7820:
-

 Summary: [C++][Gandiva] Add CMake support for compiling LLVM's IR 
into a library
 Key: ARROW-7820
 URL: https://issues.apache.org/jira/browse/ARROW-7820
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


We should be able to inject LLVM IR into libraries, assuming that `llc` is 
found on the platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7819) [C++][Gandiva] Implement gandiva-dump-ir tool to output llvm IR to a file

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7819:
-

 Summary: [C++][Gandiva] Implement gandiva-dump-ir tool to output 
llvm IR to a file
 Key: ARROW-7819
 URL: https://issues.apache.org/jira/browse/ARROW-7819
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++ - Gandiva
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


The tool should take a protobuf expression from stdin and dump the IR to 
stdout. This might require some though as the schema is not always known. It 
could mean a refactor to support plain array, especially for the Filter kernel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7818) [C++][Gandiva] Generate Filter kernels from gandiva code at compile time

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7818:
-

 Summary: [C++][Gandiva] Generate Filter kernels from gandiva code 
at compile time
 Key: ARROW-7818
 URL: https://issues.apache.org/jira/browse/ARROW-7818
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, C++ - Gandiva
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 1.0.0


The goal of this feature is to support generating kernels at compile time (and 
possibly runtime if gandiva is linked) to avoid rewriting C++ kernels that 
gandiva knows how to compile. The generated kernels would be linked in the 
compute module. 

This is an experimental task that will guide future development, notably 
implementing aggregate kernels in gandiva once instead both C++ and gandiva 
implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7798) [R] Refactor vector to Array conversion

2020-02-07 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7798:
-

 Summary: [R] Refactor vector to Array conversion
 Key: ARROW-7798
 URL: https://issues.apache.org/jira/browse/ARROW-7798
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Francois Saint-Jacques


There's a bit of technical debt accumulated in this file:

* Mix of conversion *and* casting, ideally we'd move casting out of there (at 
the cost of more memory copy). The rationale is that the conversion logic will 
differ from the CastKernels, e.g. when to raise errors, benefits from complex 
conversions like timezone... The current implementation is fast, e.g. it fuses 
the conversion and casting in a single loop at the cost of code clarity and 
divergence.
* There should be 2 paths, zero-copy, non zero-copy. The non-zero copy should 
use the newly introduced VectorToArrayConverter which will work with complex 
nested types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7767) [C++] Add a facility to create a Bitmap buffer from an data pointer with a specified sentinel

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7767:
-

 Summary: [C++] Add a facility to create a Bitmap buffer from an 
data pointer with a specified sentinel
 Key: ARROW-7767
 URL: https://issues.apache.org/jira/browse/ARROW-7767
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Francois Saint-Jacques


This is a special case for R and other cases where the null value is 
represented by a sentinel. This would read the data pointer and return a null 
bitmap buffer where bits are activate for every row where the value is not the 
sentinel value. If no sentinel is encountered, return nullptr. 


{code:c++}
template 
Result> NullBitmapFromSentinelData(MemoryPool* pool, 
const CType* data, size_t n_values, CType sentinel_value>();
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7765) [C++] Add Result to the Visitor pattern

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7765:
-

 Summary: [C++] Add Result to the Visitor pattern
 Key: ARROW-7765
 URL: https://issues.apache.org/jira/browse/ARROW-7765
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7764) [C++] Builders allocate a null bitmap buffer even if there is no nulls

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7764:
-

 Summary: [C++] Builders allocate a null bitmap buffer even if 
there is no nulls
 Key: ARROW-7764
 URL: https://issues.apache.org/jira/browse/ARROW-7764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This is an optimization where we can coalesce to nullptr if there's no null in 
the array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7761) [C++] Add S3 support to fs::FileSystemFromUri

2020-02-04 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7761:
-

 Summary: [C++] Add S3 support to fs::FileSystemFromUri
 Key: ARROW-7761
 URL: https://issues.apache.org/jira/browse/ARROW-7761
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


FileSystemFromUri doesn't support S3. This would give almost immediate support 
for S3 in python/R.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7759) [C++][Dataset] Add CsvFileFormat for CSV support

2020-02-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7759:
-

 Summary: [C++][Dataset] Add CsvFileFormat for CSV support
 Key: ARROW-7759
 URL: https://issues.apache.org/jira/browse/ARROW-7759
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


This should be a minimal implementation that binds 1-1 file and ScanTask for 
now. Streaming optimizations  can be done in ARROW-3410.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7673) [C++][Dataset] Revisit File discovery failure mode

2020-01-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7673:
-

 Summary: [C++][Dataset] Revisit File discovery failure mode
 Key: ARROW-7673
 URL: https://issues.apache.org/jira/browse/ARROW-7673
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


Currently, the default `FileSystemFactoryOptions::exclude_invalid_files` will 
silently ignore unsupported files (either IO error, not of the valid format, 
corruption, missing compression codecs, etc...) when creating a 
`FileSystemSource`.

We should change this behavior to propagate an error in the Inspect/Finish 
calls by default and allow the user to toggle `exclude_invalid_files`. The 
error should contain at least the file path and a decipherable error (if 
possible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7653) [C++][Dataset] Handle DictType index mismatch better

2020-01-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7653:
-

 Summary: [C++][Dataset] Handle DictType index mismatch better
 Key: ARROW-7653
 URL: https://issues.apache.org/jira/browse/ARROW-7653
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


There will be a schema incompatibility raised if the index width doesn't match 
for fragments/sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7602) [Archery] Add more build options

2020-01-17 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7602:
-

 Summary: [Archery] Add more build options
 Key: ARROW-7602
 URL: https://issues.apache.org/jira/browse/ARROW-7602
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7498) [C++][Dataset] Rename DataFragment/DataSource/PartitionScheme

2020-01-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7498:
-

 Summary: [C++][Dataset] Rename 
DataFragment/DataSource/PartitionScheme
 Key: ARROW-7498
 URL: https://issues.apache.org/jira/browse/ARROW-7498
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


DataFragment -> Fragment
DataSource -> Source
PartitionSchema -> PartitionSchema
*Discovery -> *Manifest



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7441) [C++] Remove compute pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7441:
-

 Summary: [C++] Remove compute pointer aliases
 Key: ARROW-7441
 URL: https://issues.apache.org/jira/browse/ARROW-7441
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7439) [C++][Dataset] Remove dataset pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7439:
-

 Summary: [C++][Dataset] Remove dataset pointer aliases
 Key: ARROW-7439
 URL: https://issues.apache.org/jira/browse/ARROW-7439
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7440) [C++][Gandiva] Remove gandiva pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7440:
-

 Summary: [C++][Gandiva] Remove gandiva pointer aliases
 Key: ARROW-7440
 URL: https://issues.apache.org/jira/browse/ARROW-7440
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7438) [C++] Remove pointer aliases

2019-12-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7438:
-

 Summary: [C++] Remove pointer aliases 
 Key: ARROW-7438
 URL: https://issues.apache.org/jira/browse/ARROW-7438
 Project: Apache Arrow
  Issue Type: Wish
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7436) [Archery] Fix benchmark default configuration

2019-12-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7436:
-

 Summary: [Archery] Fix benchmark default configuration
 Key: ARROW-7436
 URL: https://issues.apache.org/jira/browse/ARROW-7436
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


Compute module is not being built since the slim default cmake configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project

2019-12-13 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7390:
-

 Summary: [C++][Dataset] Concurrency race in Projector::Project 
 Key: ARROW-7390
 URL: https://issues.apache.org/jira/browse/ARROW-7390
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


When a DataFragment is invoked by 2 scan tasks of the same DataFragment, 
there's a race to invoke SetInputSchema. Note that ResizeMissingColumns also 
suffers from this race. The ideal goal is to make Project a const method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7380) [C++][Dataset] Implement DatasetDiscovery

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7380:
-

 Summary: [C++][Dataset] Implement DatasetDiscovery
 Key: ARROW-7380
 URL: https://issues.apache.org/jira/browse/ARROW-7380
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


Takes a list of DataSourceDiscovery and yields a Dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7379) [C++] Introduce Field::CompatiblesWith and Schema::CompatiblesWith

2019-12-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7379:
-

 Summary: [C++] Introduce Field::CompatiblesWith and 
Schema::CompatiblesWith
 Key: ARROW-7379
 URL: https://issues.apache.org/jira/browse/ARROW-7379
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


The methods verifies if fields/schemas are compatible with regards to naming 
and type. This is a partly extracted from `UnifySchemas`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7360) [R] Can't use dataset's filter with non-literal expression

2019-12-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7360:
-

 Summary: [R] Can't use dataset's filter with non-literal expression
 Key: ARROW-7360
 URL: https://issues.apache.org/jira/browse/ARROW-7360
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques


The following will generate an error


{code:r}
test_that("filtering with expression", {
  char_sym <- "b"   
  expect_dplyr_equal(   
input %>%   
  filter(chr == char_sym) %>%   
  select(string = chr, int) %>% 
  collect(),
tbl 
  ) 
})  

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7339) [CMake] Thrift version not respected in CMake configuration version.txt

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7339:
-

 Summary: [CMake] Thrift version not respected in CMake 
configuration version.txt
 Key: ARROW-7339
 URL: https://issues.apache.org/jira/browse/ARROW-7339
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


If thrift is requested via BUNBLED, thrift 0.9.1 will be downloaded instead of 
the requested version. This is due to FindThrift.cmake overriding 
THRIFT_VERSION from the locally installed thrift compiler (0.9.1. on ubuntu 
18.04).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7338) [C++] Rename SimpleDataSource to InMemoryDataSource

2019-12-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7338:
-

 Summary: [C++] Rename SimpleDataSource to InMemoryDataSource
 Key: ARROW-7338
 URL: https://issues.apache.org/jira/browse/ARROW-7338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


The constructor should take a generator

{code:c++}
// Some comments here
class InMemoryDataSource : public DataSource {
  public:
using Generator = std::function>;

InMemoryDataSource(Generator&& generator);
// Convenience constructor to support a fixed list of RecordBatch
InMemoryDataSource(std::shared_ptr);
InMemoryDataSource(std::vector>);

private:
  Generator generator;
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7272) [C++][Java] JNI bridge between RecordBatch and VectorSchemaRoot

2019-11-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7272:
-

 Summary: [C++][Java] JNI bridge between RecordBatch and 
VectorSchemaRoot
 Key: ARROW-7272
 URL: https://issues.apache.org/jira/browse/ARROW-7272
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Java
Reporter: Francois Saint-Jacques


Given a C++ std::shared_ptr, retrieve it in java as a 
VectorSchemaRoot class. Gandiva already offer a similar facility but with raw 
buffers. It would be convenient if users could call C++ that yields RecordBatch 
and retrieve it in a seamless fashion.

This would remove one roadblock of using C++ dataset facility in Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7265) [Format][C++] Clarify the usage of typeIds in Union type documentation

2019-11-26 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7265:
-

 Summary: [Format][C++] Clarify the usage of typeIds in Union type 
documentation
 Key: ARROW-7265
 URL: https://issues.apache.org/jira/browse/ARROW-7265
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The documentation is unclear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7210) [C++] Scalar cast should support time-based types

2019-11-19 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7210:
-

 Summary: [C++] Scalar cast should support time-based types
 Key: ARROW-7210
 URL: https://issues.apache.org/jira/browse/ARROW-7210
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This would allow supporting a minimum of expression evaluation on time-based 
arrays.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7178) [C++] Vendor forward compatible std::optional

2019-11-15 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7178:
-

 Summary: [C++] Vendor forward compatible std::optional
 Key: ARROW-7178
 URL: https://issues.apache.org/jira/browse/ARROW-7178
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Having std::optional was mentioned a few time, [~emkornfi...@gmail.com] 
suggested https://github.com/martinmoene/optional-lite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7148) [C++][Dataset] API cleanup

2019-11-12 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7148:
-

 Summary: [C++][Dataset] API cleanup
 Key: ARROW-7148
 URL: https://issues.apache.org/jira/browse/ARROW-7148
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7079) [C++][Dataset] Implement ScalarAsStatisctics for non-primitive types

2019-11-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7079:
-

 Summary: [C++][Dataset] Implement ScalarAsStatisctics for 
non-primitive types
 Key: ARROW-7079
 URL: https://issues.apache.org/jira/browse/ARROW-7079
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Dataset
Reporter: Francois Saint-Jacques


Statistics are not extracted for the following (parquet) types

- BYTE_ARRAY
- FLBA
- Any logical timestamps/dates



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types

2019-10-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7017:
-

 Summary: [C++] Refactor AddKernel to support other operations and 
types
 Key: ARROW-7017
 URL: https://issues.apache.org/jira/browse/ARROW-7017
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Compute
Reporter: Francois Saint-Jacques


* Should avoid using builders (and/or NULLs) since the output shape is known a 
compute time.
 * Should be refatored to support other operations, e.g. Substraction, 
Multiplication.
 * Should have a overflow, underflow detection mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7007) [C++] Enable mmap option for LocalFs

2019-10-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7007:
-

 Summary: [C++] Enable mmap option for LocalFs
 Key: ARROW-7007
 URL: https://issues.apache.org/jira/browse/ARROW-7007
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6988) [CI][R] Buildbot's R Conda is failing

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6988:
-

 Summary: [CI][R] Buildbot's R Conda is failing
 Key: ARROW-6988
 URL: https://issues.apache.org/jira/browse/ARROW-6988
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


{code:java}
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  25: tryCatch(withCallingHandlers({eval(code, test_env)if (!handled && 
!is.null(test)) {skip_empty()}}, expectation = handle_expectation, 
skip = handle_skip, warning = handle_warning, message = handle_message, 
error = handle_error), error = handle_fatal, skip = function(e) {})
  26: test_code(NULL, exprs, env)
  27: source_file(path, new.env(parent = env), chdir = TRUE, wrap = wrap)
  28: force(code)
  29: with_reporter(reporter = reporter, start_end_reporter = 
start_end_reporter, {reporter$start_file(basename(path))
lister$start_file(basename(path))source_file(path, new.env(parent = 
env), chdir = TRUE, wrap = wrap)reporter$.end_context() 
   reporter$end_file()})
  30: FUN(X[[i]], ...)
  31: lapply(paths, test_file, env = env, reporter = current_reporter, 
start_end_reporter = FALSE, load_helpers = FALSE, wrap = wrap)
  32: force(code)
  33: with_reporter(reporter = current_reporter, results <- lapply(paths, 
test_file, env = env, reporter = current_reporter, start_end_reporter = FALSE,  
   load_helpers = FALSE, wrap = wrap))
  34: test_files(paths, reporter = reporter, env = env, stop_on_failure = 
stop_on_failure, stop_on_warning = stop_on_warning, wrap = wrap)
  35: test_dir(path = test_path, reporter = reporter, env = env, filter = 
filter, ..., stop_on_failure = stop_on_failure, stop_on_warning = 
stop_on_warning, wrap = wrap)
  36: test_package_dir(package = package, test_path = test_path, filter = 
filter, reporter = reporter, ..., stop_on_failure = stop_on_failure, 
stop_on_warning = stop_on_warning, wrap = wrap)
  37: test_check("arrow")
  An irrecoverable exception occurred. R is aborting now ...
  Segmentation fault (core dumped)
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: 1 ERROR, 1 WARNING, 2 NOTEs
See
  ‘/buildbot/AMD64_Conda_R/r/arrow.Rcheck/00check.log’
for details.
 {code}
[|https://ci.ursalabs.org/#/builders/95] 
[https://ci.ursalabs.org/#/builders/95/builds/2386] 
[https://ci.ursalabs.org/#/builders/95]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6987) [CI] Travis OSX failing to install sdk headers

2019-10-24 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6987:
-

 Summary: [CI] Travis OSX failing to install sdk headers
 Key: ARROW-6987
 URL: https://issues.apache.org/jira/browse/ARROW-6987
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques


{code:java}
sudo installer -pkg 
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
 -target /343installer: Package name is 
macOS_SDK_headers_for_macOS_10.14344installer: Certificate used to sign package 
is not trusted. Use -allowUntrusted to override.345The command 
"$TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh --only-library --homebrew" 
failed and exited with 1 during .
{code}
See [https://travis-ci.org/apache/arrow/jobs/602434884#L342-L345]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6969) [C++][Dataset] ParquetScanTask eagerly load file

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6969:
-

 Summary: [C++][Dataset] ParquetScanTask eagerly load file 
 Key: ARROW-6969
 URL: https://issues.apache.org/jira/browse/ARROW-6969
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The file content should only be read when invoking ParquetScanTask::Scan, not 
on construction. This blocks reading in a true streaming fashion with memory 
constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6965) [C++][Dataset] Optionally expose partition keys as materialized columns

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6965:
-

 Summary: [C++][Dataset] Optionally expose partition keys as 
materialized columns
 Key: ARROW-6965
 URL: https://issues.apache.org/jira/browse/ARROW-6965
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6964) [C++][Dataset] Expose a nested parellel option for Scanner

2019-10-22 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6964:
-

 Summary: [C++][Dataset] Expose a nested parellel option for Scanner
 Key: ARROW-6964
 URL: https://issues.apache.org/jira/browse/ARROW-6964
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6956) [C++] Status should use unique_ptr

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6956:
-

 Summary: [C++] Status should use unique_ptr
 Key: ARROW-6956
 URL: https://issues.apache.org/jira/browse/ARROW-6956
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


The logic of Status::State is _very_  similar to unique_ptr except the deep 
copy on copy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6953) [C++][Dataset] Implement Gandiva Filter/Projector in Scanner

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6953:
-

 Summary: [C++][Dataset] Implement Gandiva Filter/Projector in 
Scanner
 Key: ARROW-6953
 URL: https://issues.apache.org/jira/browse/ARROW-6953
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


Currently, we have `RecordBatchProjector` and `ExpressionEvaluator` to achieve 
this feature. This would implement a single class that fuse both and uses 
gandiva. This would be exposed in the ScannerBuilder via an option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6951) [C++][Dataset] Ensure column projection is passed to ParquetDataFragment

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6951:
-

 Summary: [C++][Dataset] Ensure column projection is passed to 
ParquetDataFragment
 Key: ARROW-6951
 URL: https://issues.apache.org/jira/browse/ARROW-6951
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6952) [C++][Dataset] Ensure expression filter is passed ParquetDataFragment

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6952:
-

 Summary: [C++][Dataset] Ensure expression filter is passed 
ParquetDataFragment
 Key: ARROW-6952
 URL: https://issues.apache.org/jira/browse/ARROW-6952
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


We should be able to prune RowGroups based on the expression and the statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6950) [C++][Dataset] Add example/benchmark for reading parquet files with dataset

2019-10-21 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6950:
-

 Summary: [C++][Dataset] Add example/benchmark for reading parquet 
files with dataset
 Key: ARROW-6950
 URL: https://issues.apache.org/jira/browse/ARROW-6950
 Project: Apache Arrow
  Issue Type: Test
  Components: C++
Reporter: Francois Saint-Jacques


Create an executable that load a directory with a known partition scheme with a 
filter and a projection. This will be used as a baseline for future performance 
improvement but also to show various feature of the dataset API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6902) [C++] Add String*/Binary* support for Compare kernels

2019-10-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6902:
-

 Summary: [C++] Add String*/Binary* support for Compare kernels
 Key: ARROW-6902
 URL: https://issues.apache.org/jira/browse/ARROW-6902
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6854) [Dataset] RecordBatchProjector is not thread safe

2019-10-11 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6854:
-

 Summary: [Dataset] RecordBatchProjector is not thread safe
 Key: ARROW-6854
 URL: https://issues.apache.org/jira/browse/ARROW-6854
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques


While working on ARROW-6769 I noted that RecordbBatchProjector is not thread 
safe. My goal is to use this class to wrap the ScanTaskIterator in another 
ScanTaskIterator that projects, so producer (fragments) don't have to know 
about this schema. The issue is that ScanTask are expected to run on concurrent 
thread. The projector will be invoked by multiple thread.

The lack of concurrency safety is due to adaptivity of input schemas and 
`SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes. 
 # `RecordBatchProjector` which will work with a static `from` schema, i.e. no 
adaptivity. The schema is defined at construct time. This class is thread safe 
to invoke after construction since no local modification is done.
 # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, 
std::shared_ptr] protected with a mutex. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6835) [Archery][CMake] Restore ARROW_LINT_ONLY

2019-10-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6835:
-

 Summary: [Archery][CMake] Restore ARROW_LINT_ONLY  
 Key: ARROW-6835
 URL: https://issues.apache.org/jira/browse/ARROW-6835
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: Francois Saint-Jacques


This is used by developers to fasten the cmake build creation and loosen the 
required installed toolchains (notably libraries). This was yanked because 
ARROW_LINT_ONLY effectively exit-early and doesn't generate 
`compile_commands.json`.

Restore this option, but ensure that archery toggles accordingly to the usage 
of iwyu or clang-tidy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6827) [Archery] lint sub-command should provide a --fail-fast option

2019-10-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6827:
-

 Summary: [Archery] lint sub-command should provide a --fail-fast 
option
 Key: ARROW-6827
 URL: https://issues.apache.org/jira/browse/ARROW-6827
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Archery
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6828) [Archery] Benchmark diff should provide a TUI friendly output

2019-10-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6828:
-

 Summary: [Archery] Benchmark diff should provide a TUI friendly 
output
 Key: ARROW-6828
 URL: https://issues.apache.org/jira/browse/ARROW-6828
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Archery
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6826) [Archery] Default build should be minimal

2019-10-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6826:
-

 Summary: [Archery] Default build should be minimal
 Key: ARROW-6826
 URL: https://issues.apache.org/jira/browse/ARROW-6826
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Archery
Reporter: Francois Saint-Jacques


Follow of https://github.com/apache/arrow/pull/5600/files#r332655141



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6816) [Archery] Cleanup integration module to use companion classes

2019-10-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6816:
-

 Summary: [Archery] Cleanup integration module to use companion 
classes
 Key: ARROW-6816
 URL: https://issues.apache.org/jira/browse/ARROW-6816
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Archery
Reporter: Francois Saint-Jacques


This is a followup ticket to ARROW-6466.


 * Replace print calls with utils.logger
 * Use ArrowSources instead of ARROW_HOME
 * Use utils.Command and utils.CMakeBuild where possible



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6769) [C++][Dataset] End to End dataset integration test case

2019-10-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6769:
-

 Summary: [C++][Dataset] End to End dataset integration test case
 Key: ARROW-6769
 URL: https://issues.apache.org/jira/browse/ARROW-6769
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


1. Create a DataSource from a known directory and a PartitionScheme. 
2. Create a Dataset from the previous DataSource. 
3. Request a ScannerBuilder from previous Dataset. 
4. Add filter expression to ScannerBuilder (and other options). 
5. Finalize into a Scan operation. 
6. Materialize into an arrow::Table.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6730) [CI] Use Github Actions for "C++ with clang 7" docker image

2019-09-27 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6730:
-

 Summary: [CI] Use Github Actions for "C++ with clang 7" docker 
image
 Key: ARROW-6730
 URL: https://issues.apache.org/jira/browse/ARROW-6730
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6615) [C++] Add filtering option to fs::Selector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6615:
-

 Summary: [C++] Add filtering option to fs::Selector
 Key: ARROW-6615
 URL: https://issues.apache.org/jira/browse/ARROW-6615
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


It would convenient if Selector could support file path filtering, either via a 
regex or globbing applied to the path.

This is semi required for filtering file in Dataset to properly apply the file 
format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6614) [C++][Dataset] Implement FileSystemDataSourceDiscovery

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6614:
-

 Summary: [C++][Dataset] Implement FileSystemDataSourceDiscovery
 Key: ARROW-6614
 URL: https://issues.apache.org/jira/browse/ARROW-6614
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


DataSourceDiscovery is what allows InferingSchema and constructing a DataSource 
with PartitionScheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6606) [C++] Construct tree structure from std::vector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6606:
-

 Summary: [C++] Construct tree structure from 
std::vector
 Key: ARROW-6606
 URL: https://issues.apache.org/jira/browse/ARROW-6606
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


This will be used by FileSystemDataSource for pushdown predicate pruning of 
branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6605) [C++] Add recursion depth control to fs::Selector

2019-09-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6605:
-

 Summary: [C++] Add recursion depth control to fs::Selector
 Key: ARROW-6605
 URL: https://issues.apache.org/jira/browse/ARROW-6605
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Francois Saint-Jacques


This is similar to the recursive options, but also control the depth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6476) [Java][CI] Travis java all-jdks job is broken

2019-09-06 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6476:
-

 Summary: [Java][CI] Travis java all-jdks job is broken
 Key: ARROW-6476
 URL: https://issues.apache.org/jira/browse/ARROW-6476
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


Introduced by ARROW-6433, fixing the shade check enabled evaluation of the 
incorrect body. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6448) [CI] Add crossbow notifications

2019-09-03 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6448:
-

 Summary: [CI] Add crossbow notifications
 Key: ARROW-6448
 URL: https://issues.apache.org/jira/browse/ARROW-6448
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6397) [C++][CI] Fix S3 minio failure

2019-08-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6397:
-

 Summary: [C++][CI] Fix S3 minio failure
 Key: ARROW-6397
 URL: https://issues.apache.org/jira/browse/ARROW-6397
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Continuous Integration
Reporter: Francois Saint-Jacques


See 
[https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/27065941/job/gwjmr2hudm7693ef]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6396) [C++] Add CompareOptions to Compare kernels

2019-08-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6396:
-

 Summary: [C++] Add CompareOptions to Compare kernels
 Key: ARROW-6396
 URL: https://issues.apache.org/jira/browse/ARROW-6396
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


This would add an enum ResolveNull \{ KLEENE_LOGIC, NULL_PROPAGATE }.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6378) [C++][Dataset] Implement TreeDataSource

2019-08-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6378:
-

 Summary: [C++][Dataset] Implement TreeDataSource
 Key: ARROW-6378
 URL: https://issues.apache.org/jira/browse/ARROW-6378
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques


The TreeDataSource is required to support partitions pruning of sub-trees.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6341) [Python] Implements low-level bindings to Dataset classes:

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6341:
-

 Summary: [Python] Implements low-level bindings to Dataset classes:
 Key: ARROW-6341
 URL: https://issues.apache.org/jira/browse/ARROW-6341
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Francois Saint-Jacques


The following classes should be accessible from Python:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6340) [R] Implements low-level bindings to Dataset classes

2019-08-23 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6340:
-

 Summary: [R] Implements low-level bindings to Dataset classes
 Key: ARROW-6340
 URL: https://issues.apache.org/jira/browse/ARROW-6340
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Francois Saint-Jacques


The following classes should be accessible from R:

* class DataSource
* class DataFragment
* function DiscoverySource
* class ScanContext, ScanOptions, ScanTask
* class Dataset
* class ScannerBuilder
* class Scanner

The end result is reading a directory of parquet files as a single stream



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6244) [C++] Implement Partition DataSource

2019-08-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6244:
-

 Summary: [C++] Implement Partition DataSource
 Key: ARROW-6244
 URL: https://issues.apache.org/jira/browse/ARROW-6244
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


This is a DataSource that also has partition metadata. The end goal is to 
support filtering with a DataSelector/Filter expression. The initial 
implementation should not deal with PartitionScheme yet.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6242) [C++] Implements basic Dataset/Scanner/ScannerBuilder

2019-08-14 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6242:
-

 Summary: [C++] Implements basic Dataset/Scanner/ScannerBuilder
 Key: ARROW-6242
 URL: https://issues.apache.org/jira/browse/ARROW-6242
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Francois Saint-Jacques


The goal of this would be to iterate over a Dataset and generate a "flattened" 
stream of RecordBatches from the union of data sources and data fragments. This 
should not bother with filtering yet.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6161) [C++] Implements dataset::ParquetFile and associated Scan structures

2019-08-07 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6161:
-

 Summary: [C++] Implements dataset::ParquetFile and associated Scan 
structures
 Key: ARROW-6161
 URL: https://issues.apache.org/jira/browse/ARROW-6161
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6148) Missing debian build dependencies

2019-08-06 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6148:
-

 Summary: Missing debian build dependencies
 Key: ARROW-6148
 URL: https://issues.apache.org/jira/browse/ARROW-6148
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6124) [C++] IsIn kernel should sort in a single pass (with nulls)

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6124:
-

 Summary: [C++] IsIn kernel should sort in a single pass (with 
nulls)
 Key: ARROW-6124
 URL: https://issues.apache.org/jira/browse/ARROW-6124
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.15.0
Reporter: Francois Saint-Jacques


There's a good chance that merge sort must be implemented (spill to disk, 
ChunkedArray, ...)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6123) [C++] IsIn kernel should not materialize the output internal

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6123:
-

 Summary: [C++] IsIn kernel should not materialize the output 
internal
 Key: ARROW-6123
 URL: https://issues.apache.org/jira/browse/ARROW-6123
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


It should use the helpers since the output size is known.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6122) [C++] IsIn kernel must support FixedSizeBinary

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6122:
-

 Summary: [C++] IsIn kernel must support FixedSizeBinary
 Key: ARROW-6122
 URL: https://issues.apache.org/jira/browse/ARROW-6122
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.15.0
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-6121) [Tools] Improve merge tool cli ergonomic

2019-08-02 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-6121:
-

 Summary: [Tools] Improve merge tool cli ergonomic
 Key: ARROW-6121
 URL: https://issues.apache.org/jira/browse/ARROW-6121
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


* Accepts the pull-request number as an optional (first) parameter to the script
* Supports reading the jira username/password from a file



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5923) [C++] Fix int96 comment

2019-07-12 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5923:
-

 Summary: [C++] Fix int96 comment
 Key: ARROW-5923
 URL: https://issues.apache.org/jira/browse/ARROW-5923
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Francois Saint-Jacques
Assignee: Micah Kornfield






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5914) [CI] Build bundled dependencies in docker build step

2019-07-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5914:
-

 Summary: [CI] Build bundled dependencies in docker build step
 Key: ARROW-5914
 URL: https://issues.apache.org/jira/browse/ARROW-5914
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


In the recently introduced ARROW-5803, some heavy dependencies (thrift, 
protobuf, flatbufers, grpc) are build at each invocation of docker-compose 
build (thus each travis test).

We should aim to build the third party dependencies in docker build phase 
instead, to exploit caching and docker-compose pull so that the CI step doesn't 
need to build said dependencies each time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >