[GitHub] [arrow] github-actions[bot] commented on pull request #7672: ARROW-9348: [C++] Replace usages of TestBase::MakeRandomArray in testing/gtest_util.h with RandomArrayGenerator

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7672:
URL: https://github.com/apache/arrow/pull/7672#issuecomment-655295896


   https://issues.apache.org/jira/browse/ARROW-9348



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] arw2019 opened a new pull request #7672: ARROW-9348: [C++] Replace usages of TestBase::MakeRandomArray in testing/gtest_util.h with RandomArrayGenerator

2020-07-07 Thread GitBox


arw2019 opened a new pull request #7672:
URL: https://github.com/apache/arrow/pull/7672


   This PR addresses https://issues.apache.org/jira/browse/ARROW-9348



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


houqp commented on a change in pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#discussion_r451278084



##
File path: rust/arrow/src/record_batch.rs
##
@@ -216,15 +216,28 @@ impl Into for RecordBatch {
 }
 }
 
-/// Definition of record batch reader.
+/// Trait for types that can read `RecordBatch`'s in a single-threaded context.
 pub trait RecordBatchReader {
-/// Returns schemas of this record batch reader.
-/// Implementation of this trait should guarantee that all record batches 
returned
-/// by this reader should have same schema as returned from this method.
+/// Returns the schema of this `RecordBatchReader`.
+///
+/// Implementation of this trait should guarantee that all `RecordBatch`'s 
returned by this
+/// reader should have the same schema as returned from this method.
 fn schema( self) -> SchemaRef;
 
-/// Returns next record batch.
-fn next_batch( self) -> Result>;
+/// Reads the next `RecordBatch`.
+fn next( self) -> Result>;

Review comment:
   If we are to introduce breaking change in this patch set, I think it's 
better to use `next_batch` or other method name instead of `next` everywhere so 
it's easier for users to implement Iterator trait for the same type if they 
want.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


houqp commented on a change in pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#discussion_r451273911



##
File path: rust/datafusion/src/datasource/datasource.rs
##
@@ -20,13 +20,13 @@
 use std::sync::{Arc, Mutex};
 
 use arrow::datatypes::Schema;
+use arrow::record_batch::SendableRecordBatchReader;
 
 use crate::error::Result;
-use crate::execution::physical_plan::BatchIterator;
 
-/// Returned by implementors of `Table#scan`, this `BatchIterator` is wrapped 
with
+/// Returned by implementors of `Table#scan`, this `SendableRecordBatchReader` 
is wrapped with
 /// an `Arc` and `Mutex` so that it can be shared across threads as it is used.
-pub type ScanResult = Arc>;
+pub type ScanResult = Arc>;

Review comment:
   Do we still need to wrap it with Arc and Mutex when 
`SendableRecordBatchReader` already implements Send and Sync? It looks like 
`ScanResult` can just be an alias to `SendableRecordBatchReader` or 
`Arc>`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] houqp commented on a change in pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


houqp commented on a change in pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#discussion_r451273265



##
File path: rust/arrow/src/record_batch.rs
##
@@ -216,15 +216,28 @@ impl Into for RecordBatch {
 }
 }
 
-/// Definition of record batch reader.
+/// Trait for types that can read `RecordBatch`'s in a single-threaded context.
 pub trait RecordBatchReader {
-/// Returns schemas of this record batch reader.
-/// Implementation of this trait should guarantee that all record batches 
returned
-/// by this reader should have same schema as returned from this method.
+/// Returns the schema of this `RecordBatchReader`.
+///
+/// Implementation of this trait should guarantee that all `RecordBatch`'s 
returned by this
+/// reader should have the same schema as returned from this method.
 fn schema( self) -> SchemaRef;
 
-/// Returns next record batch.
-fn next_batch( self) -> Result>;
+/// Reads the next `RecordBatch`.
+fn next( self) -> Result>;
+}
+
+/// Trait for types that can read `RecordBatch`'s in a multi-threaded context.
+pub trait SendableRecordBatchReader: Send + Sync {

Review comment:
   Nitpick here because naming is hard and i don't have a good answer to 
this either ;P But the sendable prefix could be changed something more accurate 
since this trait also requires sync.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] praveenbingo closed pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva

2020-07-07 Thread GitBox


praveenbingo closed pull request #7642:
URL: https://github.com/apache/arrow/pull/7642


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


stevengj edited a comment on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   > It seems utf8proc (incorrectly?) claims some undefined codepoints (e.g. 
https://www.compart.com/en/unicode/U+08BE) are UTF8PROC_CATEGORY_LO (General 
category Letter Other).
   
   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct for that character.   It 
sounds like you may be looking at obsolete Unicode tables?
   
   > utf8proc doesn't store and expose the information if a codepoint is of a 
Numeric type
   
   Can't you use the Unicode category (N*) for this?  That's [what Julia 
does](https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl#L405).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


stevengj edited a comment on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   > It seems utf8proc (incorrectly?) claims some undefined codepoints (e.g. 
https://www.compart.com/en/unicode/U+08BE) are UTF8PROC_CATEGORY_LO (General 
category Letter Other).
   
   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct.   It sounds like you may be 
looking at obsolete Unicode tables?
   
   > utf8proc doesn't store and expose the information if a codepoint is of a 
Numeric type
   
   Can't you use the Unicode category (N*) for this?  That's [what Julia 
does](https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl#L405).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


stevengj edited a comment on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct.   It sounds like you may be 
looking at obsolete Unicode tables?
   
   > utf8proc doesn't store and expose the information if a codepoint is of a 
Numeric type
   
   Can't you use the Unicode category (N*) for this?  That's [what Julia 
does](https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl#L405).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] stevengj edited a comment on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


stevengj edited a comment on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct.   It sounds like you may be 
looking at obsolete Unicode tables?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] stevengj commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


stevengj commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655253087


   [U+08BE](https://www.fileformat.info/info/unicode/char/08be/index.htm) was 
defined in Unicode 13, and category Lo is correct.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


BryanCutler commented on pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#issuecomment-655246147


   On a related note, it seems like our netty version 4.1.27 is pretty old now, 
~2 years, do you all think it would be good to upgrade this before the 1.0.0 
release? It looks like there is a security vulnerability pre 4.1.44 
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-20445



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


BryanCutler commented on pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#issuecomment-655244616


   I agree that the recommended allocator should still be the netty one for 
now, so I guess it wouldn't be good to bundle the unsafe allocator as a 
possible default. I'm good with raising an error then if a default allocator is 
not on the classpath, as long as it's clear to the user what they need to do. 
Right now it looks like the error is:
   ```
   java.lang.RuntimeException: No DefaultAllocationManager found on classpath. 
Can't allocate Arrow buffers.
   ```
   @rymurr would you mind adding something to the message like "it is 
recommended to add `arrow-memory-netty` or `arrow-memory-unsafe` as a 
dependency to provide a `DefaultAllocatorManager"?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


liyafan82 commented on pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#issuecomment-655232313


   > Looks fine for the most part, but I'm not really sure why we need to 
separate `arrow-memory-core` and `arrow-memory-unsafe`? Couldn't those be 
combined since it wouldn't add any other dependencies, and that would also 
simplify things. Plus, it doesn't really make sense to have a module 
`arrow-memory-core` without a default allocator that would probably build fine 
with `arrow-vector`, but then blow up at runtime. What do you think @rymurr and 
@liyafan82 ?
   
   @BryanCutler I agree with you that it makes things simpler. 
   
   Since we may need to continue to use netty implementation as the default one 
(as indicated by @jacques-n ), maybe it is beneficial to keep the 
arrow-memory-unsafe module, at least for now. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


liyafan82 commented on a change in pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#discussion_r451235584



##
File path: java/memory/memory-core/pom.xml
##
@@ -0,0 +1,65 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  
+arrow-memory
+org.apache.arrow
+1.0.0-SNAPSHOT
+  
+  4.0.0
+
+  arrow-memory-core
+
+  Arrow Memory - Core
+  Core off-heap memory management libraries for Arrow 
ValueVectors.
+
+  
+
+  com.google.code.findbugs
+  jsr305
+
+
+  org.slf4j
+  slf4j-api
+
+
+  org.immutables
+  value
+
+  
+
+  
+
+  
+maven-surefire-plugin
+3.0.0-M3
+
+  true
+  true
+  ${forkCount}
+  true
+  
+${project.build.directory}
+
true
+
1048576
+UTC
+  
+  
+  -Darrow.vector.max_allocation_bytes=1048576

Review comment:
   This seems duplicate with the system property defined above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sagnikc-dremio commented on pull request #7641: ARROW-9328: [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string

2020-07-07 Thread GitBox


sagnikc-dremio commented on pull request #7641:
URL: https://github.com/apache/arrow/pull/7641#issuecomment-655228953


   @pprudhvi @projjal Can you please review this change?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#issuecomment-655212330


   https://issues.apache.org/jira/browse/ARROW-8344



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jacques-n commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


jacques-n commented on pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#issuecomment-655210526


   > Looks fine for the most part, but I'm not really sure why we need to 
separate `arrow-memory-core` and `arrow-memory-unsafe`? Couldn't those be 
combined since it wouldn't add any other dependencies, and that would also 
simplify things. Plus, it doesn't really make sense to have a module 
`arrow-memory-core` without a default allocator that would probably build fine 
with `arrow-vector`, but then blow up at runtime. What do you think @rymurr and 
@liyafan82 ?
   
   This is modeled after the slf4j pattern where the logging implementation is 
separated from the core apis. That way the default pattern can people sourcing 
the desired allocator via dependency without having to configure one. This 
seems much cleaner that the previous approaches where people had to manually 
configure or override via system properties. 
   
   Additionally, I'd add that for new users I think we would suggest using the 
Netty one, not the unsafe one. It is much more complete/comprehensive and 
intelligent. So having a default implementation that we always tell people to 
override seems worse than they having a hard failure if they don't source any. 
If we want to make things easier, we could also introduce some vector + 
allocator depedency poms and then use those in documentation, etc.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


mr-smidge commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r451216615



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -237,7 +329,9 @@ public ReadOnlySpan GetBytes(int index)
 
 if (IsNull(index))
 {
-return null;
+// Note that `return null;` is valid syntax, but would be 
misleading as `null` in the context of a span
+// is actually returned as an empty span.
+return ReadOnlySpan.Empty;

Review comment:
   This clause previously said `return null;`, which was misleading as it 
suggested that the method could be used to identify null values in the array.  
However, `null` in the context of a `ReadOnlySpan` is the same as the empty 
span, and so the method can't distinguish null values from empty ones:
   
   ```csharp
   Assert.True(ReadOnlySpan.Empty == null)
   ```
   
   The new code is clearer in intent.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


mr-smidge commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r451215646



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -173,11 +245,19 @@ public TBuilder Set(int index, byte value)
 throw new NotImplementedException();
 }
 
+/// 
+/// Clear all contents appended so far.
+/// 
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder Clear()
 {
 ValueOffsets.Clear();
 ValueBuffer.Clear();
 ValidityBuffer.Clear();
+
+// Always write the first offset before anything has been 
written.
+Offset = 0;

Review comment:
   Not resetting the `Offset` member variable was the root cause of the 
string-clearing bug referenced in ARROW-8344.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


mr-smidge commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r451215414



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
 ValueOffsets = new ArrowBuffer.Builder();
 ValueBuffer = new ArrowBuffer.Builder();
 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+// From the docs:
+//
+// The offsets buffer contains length + 1 signed integers 
(either 32-bit or 64-bit, depending on the
+// logical type), which encode the start position of each slot 
in the data buffer. The length of the
+// value in each slot is computed using the difference between 
the offset at that slot’s index and the
+// subsequent offset.
+//
+// In this builder, we choose to append the first offset 
(zero) upon construction, and each trailing
+// offset is then added after each individual item has been 
appended.
+ValueOffsets.Append(this.Offset);
 }
 
 protected abstract TArray Build(ArrayData data);
 
-public int Length => ValueOffsets.Length;
+/// 
+/// Gets the length of the array built so far.
+/// 
+public int Length => ValueOffsets.Length - 1;
 
+/// 
+/// Build an Arrow array from the appended contents so far.
+/// 
+/// Optional memory allocator.
+/// Returns an array of type .
 public TArray Build(MemoryAllocator allocator = default)
 {
-ValueOffsets.Append(Offset);
-
-ArrowBuffer validityBuffer = NullCount > 0
-? ValidityBuffer.Build(allocator)
-: ArrowBuffer.Empty;
-
-var data = new ArrayData(DataType, ValueOffsets.Length - 1, 
NullCount, 0,
-new[] { validityBuffer, ValueOffsets.Build(allocator), 
ValueBuffer.Build(allocator) });
+var bufs = new[]
+{
+NullCount > 0 ? ValidityBuffer.Build(allocator) : 
ArrowBuffer.Empty,
+ValueOffsets.Build(allocator),
+ValueBuffer.Build(allocator),
+};
+var data = new ArrayData(
+DataType,
+length: ValueOffsets.Length - 1,
+NullCount,
+offset: 0,
+bufs);
 
 return Build(data);
 }
 
+/// 
+/// Append a single null value to the array.
+/// 
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder AppendNull()
 {
-ValueOffsets.Append(Offset);
+// Do not add to the value buffer in the case of a null.
+// Note that we do not need to increment the offset as a 
result.
 ValidityBuffer.Append(false);
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
+/// 
+/// Appends a value, consisting of a single byte, to the array.
+/// 
+/// Byte value to append.
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder Append(byte value)
 {
-ValueOffsets.Append(Offset);
 ValueBuffer.Append(value);
-Offset++;
 ValidityBuffer.Append(true);
+Offset++;
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
+/// 
+/// Append a value, consisting of a span of bytes, to the array.
+/// 
+/// 
+/// Note that a single value is added, which consists of 
arbitrarily many bytes.  If multiple values are
+/// to be added, use the  method.
+/// 
+/// Span of bytes to add.
+/// Returns the builder (for fluent-style 
composition).
 public TBuilder Append(ReadOnlySpan span)
 {
-ValueOffsets.Append(Offset);
 ValueBuffer.Append(span);
 ValidityBuffer.Append(true);
 Offset += span.Length;
+ValueOffsets.Append(Offset);
 return Instance;
 }
 
-public TBuilder AppendRange(IEnumerable values)
+/// 
+/// Append a value, consisting of an enumerable collection of 
bytes, to the array.
+/// 
+/// 
+/// Note that this method appends a single value, which may 
consist of arbitrarily many bytes.  If multiple
+ 

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


mr-smidge commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r451214556



##
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
 ValueOffsets = new ArrowBuffer.Builder();
 ValueBuffer = new ArrowBuffer.Builder();
 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+// From the docs:
+//
+// The offsets buffer contains length + 1 signed integers 
(either 32-bit or 64-bit, depending on the
+// logical type), which encode the start position of each slot 
in the data buffer. The length of the
+// value in each slot is computed using the difference between 
the offset at that slot’s index and the
+// subsequent offset.
+//
+// In this builder, we choose to append the first offset 
(zero) upon construction, and each trailing
+// offset is then added after each individual item has been 
appended.
+ValueOffsets.Append(this.Offset);

Review comment:
   For an array with N items, there need to be N+1 offset values written.
   
   Previously, an offset was written with each `Append*()` call, with the extra 
one written when calling `Build()`.  This PR flips this around to add the extra 
offset upon construction (or calling `Clear()`), with the others written 
post-hoc during `Append*()` calls.  As such, `Build()` is now idempotent.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mr-smidge opened a new pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

2020-07-07 Thread GitBox


mr-smidge opened a new pull request #7671:
URL: https://github.com/apache/arrow/pull/7671


   This PR fixes a few bugs in `BinaryArray.Builder()`:
   * Fixes the `Clear()` method, which previously would break all 
subsequently-appended values (see JIRA ticket for examples).
   * Makes the `Build()` method idempotent, by making sure the builder's 
internal arrays are not modified when it is called.
   
   And makes a few other improvements:
   * Comprehensive test coverage of the builder's key methods (this was written 
in proper TDD-style, so these tests would fail if run on `master`).
   * XML docstrings for all methods affected.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


kou closed pull request #7669:
URL: https://github.com/apache/arrow/pull/7669


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


kou commented on pull request #7669:
URL: https://github.com/apache/arrow/pull/7669#issuecomment-655204658


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #7663: Sorting on pyarrow data structures ?

2020-07-07 Thread GitBox


wesm commented on issue #7663:
URL: https://github.com/apache/arrow/issues/7663#issuecomment-655197308


   This isn't where we handle feature requests. There are some sorting-related 
issues in JIRA; if you do not find one that describes the APIs are you are 
looking for, could you open a new one? Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed issue #7663: Sorting on pyarrow data structures ?

2020-07-07 Thread GitBox


wesm closed issue #7663:
URL: https://github.com/apache/arrow/issues/7663


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-07 Thread GitBox


lidavidm commented on pull request #7664:
URL: https://github.com/apache/arrow/pull/7664#issuecomment-655193980


   Just a high level comment: if I'm reading this right, V4 is still the 
default metadata version and applications opt in to V5 when they want to 
read/write unions. Am I understanding this right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nickpoorman commented on pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox


nickpoorman commented on pull request #7670:
URL: https://github.com/apache/arrow/pull/7670#issuecomment-655192799


   @stuartcarnie @sbinet 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7670:
URL: https://github.com/apache/arrow/pull/7670#issuecomment-655153093


   https://issues.apache.org/jira/browse/ARROW-9365



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7669:
URL: https://github.com/apache/arrow/pull/7669#issuecomment-655153095


   https://issues.apache.org/jira/browse/ARROW-9351



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nickpoorman opened a new pull request #7670: ARROW-9365: [Go] Added the rest of the implemented array builders to NewBuilder

2020-07-07 Thread GitBox


nickpoorman opened a new pull request #7670:
URL: https://github.com/apache/arrow/pull/7670


   This PR adds the rest of the implemented typed array builders to the 
NewBuilder function. I ran into needing this because `NewStructBuilder` 
internally calls `NewBuilder`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r451159113



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc
##
@@ -0,0 +1,223 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/common.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_writer.h"
+#include "arrow/visitor_inline.h"
+
+namespace arrow {
+
+namespace compute {
+namespace internal {
+namespace {
+
+template 
+using enable_if_supports_fill_null = enable_if_t::value, R>;
+
+template 
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool* pool) {}
+
+  Status Init(const FillNullOptions& options) {
+fill_value = options.fill_value.scalar();
+return Status::OK();
+  }
+
+  std::shared_ptr fill_value;
+};
+
+template <>
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool*) {}
+
+  Status Init(const FillNullOptions& options) { return Status::OK(); }
+
+  std::shared_ptr fill_value;
+};
+
+struct InitFillNullStateVisitor {
+  KernelContext* ctx;
+  const FillNullOptions* options;
+  std::unique_ptr result;
+
+  InitFillNullStateVisitor(KernelContext* ctx, const FillNullOptions* options)
+  : ctx(ctx), options(options) {}
+
+  template 
+  Status Init() {
+using StateType = FillNullState;
+result.reset(new StateType(ctx->exec_context()->memory_pool()));
+return static_cast(result.get())->Init(*options);
+  }
+
+  Status Visit(const DataType&) { return Init(); }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+return Init();
+  }
+
+  Status GetResult(std::unique_ptr* out) {
+RETURN_NOT_OK(VisitTypeInline(*options->fill_value.type(), this));
+*out = std::move(result);
+return Status::OK();
+  }
+};
+
+std::unique_ptr InitFillNull(KernelContext* ctx,
+  const KernelInitArgs& args) {
+  InitFillNullStateVisitor visitor{ctx,
+   static_cast(args.options)};
+  std::unique_ptr result;
+  ctx->SetStatus(visitor.GetResult());
+  return result;
+}
+
+struct ScalarFillVisitor {
+  KernelContext* ctx;
+  const ArrayData& data;
+  Datum* out;
+
+  ScalarFillVisitor(KernelContext* ctx, const ArrayData& data, Datum* out)
+  : ctx(ctx), data(data), out(out) {}
+
+  Status Visit(const DataType&) {
+ArrayData* out_arr = out->mutable_array();
+*out_arr = data;
+return Status::OK();
+  }
+
+  Status Visit(const BooleanType&) {
+const auto& state = checked_cast&>(*ctx->state());
+bool value = UnboxScalar::Unbox(*state.fill_value);
+ArrayData* out_arr = out->mutable_array();
+FirstTimeBitmapWriter bit_writer(out_arr->buffers[1]->mutable_data(), 
out_arr->offset,
+ out_arr->length);
+FirstTimeBitmapWriter 
bit_writer_validity(out_arr->buffers[0]->mutable_data(),
+  out_arr->offset, 
out_arr->length);
+if (data.null_count != 0) {
+  BitmapReader bit_reader(data.buffers[1]->data(), data.offset, 
data.length);
+  BitmapReader bit_reader_validity(data.buffers[0]->data(), data.offset, 
data.length);
+  for (int64_t i = 0; i < data.length; i++) {
+if (bit_reader_validity.IsNotSet()) {
+  if (value == true) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+} else {
+  if (bit_reader.IsSet()) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+}
+bit_reader.Next();
+bit_writer.Next();
+bit_reader_validity.Next();
+bit_writer_validity.Next();
+  }
+  bit_writer_validity.Finish();
+  bit_writer.Finish();
+} else {
+  *out_arr = data;
+}
+return Status::OK();
+  }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+using T = typename GetViewType::T;
+const 

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r451159046



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc
##
@@ -0,0 +1,223 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/common.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_writer.h"
+#include "arrow/visitor_inline.h"
+
+namespace arrow {
+
+namespace compute {
+namespace internal {
+namespace {
+
+template 
+using enable_if_supports_fill_null = enable_if_t::value, R>;
+
+template 
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool* pool) {}
+
+  Status Init(const FillNullOptions& options) {
+fill_value = options.fill_value.scalar();
+return Status::OK();
+  }
+
+  std::shared_ptr fill_value;
+};
+
+template <>
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool*) {}
+
+  Status Init(const FillNullOptions& options) { return Status::OK(); }
+
+  std::shared_ptr fill_value;
+};
+
+struct InitFillNullStateVisitor {
+  KernelContext* ctx;
+  const FillNullOptions* options;
+  std::unique_ptr result;
+
+  InitFillNullStateVisitor(KernelContext* ctx, const FillNullOptions* options)
+  : ctx(ctx), options(options) {}
+
+  template 
+  Status Init() {
+using StateType = FillNullState;
+result.reset(new StateType(ctx->exec_context()->memory_pool()));
+return static_cast(result.get())->Init(*options);
+  }
+
+  Status Visit(const DataType&) { return Init(); }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+return Init();
+  }
+
+  Status GetResult(std::unique_ptr* out) {
+RETURN_NOT_OK(VisitTypeInline(*options->fill_value.type(), this));
+*out = std::move(result);
+return Status::OK();
+  }
+};
+
+std::unique_ptr InitFillNull(KernelContext* ctx,
+  const KernelInitArgs& args) {
+  InitFillNullStateVisitor visitor{ctx,
+   static_cast(args.options)};
+  std::unique_ptr result;
+  ctx->SetStatus(visitor.GetResult());
+  return result;
+}
+
+struct ScalarFillVisitor {
+  KernelContext* ctx;
+  const ArrayData& data;
+  Datum* out;
+
+  ScalarFillVisitor(KernelContext* ctx, const ArrayData& data, Datum* out)
+  : ctx(ctx), data(data), out(out) {}
+
+  Status Visit(const DataType&) {
+ArrayData* out_arr = out->mutable_array();
+*out_arr = data;
+return Status::OK();
+  }
+
+  Status Visit(const BooleanType&) {
+const auto& state = checked_cast&>(*ctx->state());
+bool value = UnboxScalar::Unbox(*state.fill_value);
+ArrayData* out_arr = out->mutable_array();
+FirstTimeBitmapWriter bit_writer(out_arr->buffers[1]->mutable_data(), 
out_arr->offset,
+ out_arr->length);
+FirstTimeBitmapWriter 
bit_writer_validity(out_arr->buffers[0]->mutable_data(),
+  out_arr->offset, 
out_arr->length);
+if (data.null_count != 0) {
+  BitmapReader bit_reader(data.buffers[1]->data(), data.offset, 
data.length);
+  BitmapReader bit_reader_validity(data.buffers[0]->data(), data.offset, 
data.length);
+  for (int64_t i = 0; i < data.length; i++) {
+if (bit_reader_validity.IsNotSet()) {
+  if (value == true) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+} else {
+  if (bit_reader.IsSet()) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+}
+bit_reader.Next();
+bit_writer.Next();
+bit_reader_validity.Next();
+bit_writer_validity.Next();
+  }
+  bit_writer_validity.Finish();
+  bit_writer.Finish();
+} else {
+  *out_arr = data;
+}
+return Status::OK();
+  }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+using T = typename GetViewType::T;
+const 

[GitHub] [arrow] github-actions[bot] commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7669:
URL: https://github.com/apache/arrow/pull/7669#issuecomment-655148194


   Revision: 3d5cc704cda625df93aa045e17860fc5d5ea62d5
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-387](https://github.com/ursa-labs/crossbow/branches/all?query=actions-387)
   
   |Task|Status|
   ||--|
   
|test-ubuntu-18.04-cpp-cmake32|[![CircleCI](https://img.shields.io/circleci/build/github/ursa-labs/crossbow/actions-387-circle-test-ubuntu-18.04-cpp-cmake32.svg)](https://circleci.com/gh/ursa-labs/crossbow/tree/actions-387-circle-test-ubuntu-18.04-cpp-cmake32)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


kou commented on pull request #7669:
URL: https://github.com/apache/arrow/pull/7669#issuecomment-655147282


   @github-actions crossbow submit test-ubuntu-18.04-cpp-cmake32



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7669: ARROW-9351: [C++] Fix CMake 3.2 detection in option value validation

2020-07-07 Thread GitBox


kou opened a new pull request #7669:
URL: https://github.com/apache/arrow/pull/7669


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-07 Thread GitBox


kou commented on a change in pull request #7589:
URL: https://github.com/apache/arrow/pull/7589#discussion_r451154810



##
File path: dev/release/post-09-docs.sh
##
@@ -42,20 +47,20 @@ popd
 pushd "${ARROW_DIR}"
 git checkout "${release_tag}"

Review comment:
   Could you try the following change? I confirmed that this can build C++ 
and GLib with CUDA but Java build is failed... (I think that the Java build 
error is unrelated to this.)
   
   ```diff
   diff --git a/ci/docker/linux-apt-docs.dockerfile 
b/ci/docker/linux-apt-docs.dockerfile
   index 55ce2b87c..d0d98d5cd 100644
   --- a/ci/docker/linux-apt-docs.dockerfile
   +++ b/ci/docker/linux-apt-docs.dockerfile
   @@ -43,6 +43,7 @@ RUN apt-get update -y && \
libtool \
libxml2-dev \
ninja-build \
   +nvidia-cuda-toolkit \
openjdk-${jdk}-jdk-headless \
pandoc \
r-base=${r}* \
   diff --git a/dev/release/post-09-docs.sh b/dev/release/post-09-docs.sh
   index e81147a0e..cf983d066 100755
   --- a/dev/release/post-09-docs.sh
   +++ b/dev/release/post-09-docs.sh
   @@ -50,7 +50,7 @@ git checkout "${release_tag}"
archery docker run \
  -v "${ARROW_SITE_DIR}/docs:/build/docs" \
  -e ARROW_DOCS_VERSION="${version}" \
   -  ubuntu-cuda-docs
   +  ubuntu-docs

: ${PUSH:=1}

   diff --git a/docker-compose.yml b/docker-compose.yml
   index f124f28fb..37e0b054d 100644
   --- a/docker-compose.yml
   +++ b/docker-compose.yml
   @@ -65,7 +65,6 @@ x-ccache: 
x-with-gpus:
  - ubuntu-cuda-cpp
  - ubuntu-cuda-python
   -  - ubuntu-cuda-docs

x-hierarchy:
  # This section is used by the archery tool to enable building nested 
images,
   @@ -116,8 +115,7 @@ x-hierarchy:
  - ubuntu-docs
- ubuntu-r
  - ubuntu-cuda-cpp:
   -- ubuntu-cuda-python:
   -  - ubuntu-cuda-docs
   +- ubuntu-cuda-python
  - ubuntu-csharp
  - ubuntu-cpp-sanitizer
  - ubuntu-r-sanitizer
   @@ -1125,6 +1123,7 @@ services:
base: ${REPO}:${ARCH}-ubuntu-${UBUNTU}-python-3
environment:
  <<: *ccache
   +  ARROW_CUDA: "ON"
  ARROW_GLIB_GTK_DOC: "true"
volumes: *ubuntu-volumes
command:  >
   @@ -1137,30 +1136,6 @@ services:
/arrow/ci/scripts/r_build.sh /arrow true &&
/arrow/ci/scripts/docs_build.sh /arrow /build"

   -  ubuntu-cuda-docs:
   -# Usage:
   -#   docker-compose build ubuntu-cuda-cpp
   -#   docker-compose build ubuntu-cuda-python
   -#   docker-compose build ubuntu-cuda-docs
   -#   docker-compose run --rm ubuntu-cuda-docs
   -# Notes about --runtime and --gpus and docker-compose incompatibility
   -image: ${REPO}:${ARCH}-ubuntu-${UBUNTU}-cuda-${CUDA}-python-3-docs
   -build:
   -  context: .
   -  dockerfile: ci/docker/linux-apt-docs.dockerfile
   -  cache_from:
   -- ${REPO}:${ARCH}-ubuntu-${UBUNTU}-docs
   -  args:
   -jdk: ${JDK}
   -node: ${NODE}
   -base: ${REPO}:${ARCH}-ubuntu-${UBUNTU}-cuda-${CUDA}-python-3
   -environment:
   -  <<: *ccache
   -  ARROW_CUDA: "ON"
   -  ARROW_GLIB_GTK_DOC: "true"
   -volumes: *ubuntu-volumes
   -command: *docs-command
   -
  # Tools 
#

  ubuntu-lint:
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-07 Thread GitBox


BryanCutler commented on a change in pull request #7619:
URL: https://github.com/apache/arrow/pull/7619#discussion_r451102944



##
File path: java/adapter/orc/pom.xml
##
@@ -15,10 +15,16 @@
 
 
 org.apache.arrow
-arrow-memory
-${project.version}
+arrow-memory-core
+1.0.0-SNAPSHOT

Review comment:
   was this intentional?

##
File path: java/memory/memory-core/pom.xml
##
@@ -0,0 +1,65 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  
+arrow-memory
+org.apache.arrow
+1.0.0-SNAPSHOT
+  
+  4.0.0
+
+  arrow-memory-core
+
+  Arrow Memory - Core
+  Core off-heap memory management libraries for Arrow 
ValueVectors.
+
+  
+
+  com.google.code.findbugs
+  jsr305
+
+
+  org.slf4j
+  slf4j-api
+
+
+  org.immutables
+  value
+
+  
+
+  
+
+  
+maven-surefire-plugin
+3.0.0-M3
+
+  true
+  true
+  ${forkCount}
+  true
+  
+${project.build.directory}
+
true

Review comment:
   is this necessary here?

##
File path: java/vector/pom.xml
##
@@ -50,6 +50,12 @@
   commons-codec
   1.10
 
+
+  org.apache.arrow
+  arrow-memory-netty
+  ${project.version}
+  test
+

Review comment:
   Maybe we should run the unit tests once for each of the 2 allocators?

##
File path: java/vector/pom.xml
##
@@ -50,6 +50,12 @@
   commons-codec
   1.10
 
+
+  org.apache.arrow
+  arrow-memory-netty
+  ${project.version}
+  test
+
 
   io.netty
   netty-buffer

Review comment:
   This could be removed now right?

##
File path: java/memory/memory-netty/pom.xml
##
@@ -0,0 +1,107 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  
+arrow-memory
+org.apache.arrow
+1.0.0-SNAPSHOT
+  
+  4.0.0
+
+  arrow-memory-netty
+  Arrow Memory - Netty
+  Netty allocator and utils for allocating memory in 
Arrow
+
+  
+
+  org.apache.arrow
+  arrow-memory-core
+  ${project.version}
+
+
+  io.netty
+  netty-buffer
+
+
+  io.netty
+  netty-common
+
+
+  org.slf4j
+  slf4j-api
+
+
+  org.immutables
+  value
+
+  
+
+  
+
+  
+maven-surefire-plugin
+3.0.0-M3
+
+  true
+  true
+  ${forkCount}
+  true
+  
+${project.build.directory}
+
true
+
1048576
+UTC
+  
+  
+  -Darrow.vector.max_allocation_bytes=1048576
+
+  
+
+  
+
+  
+
+  
+  integration-tests
+  
+
+  
+org.apache.maven.plugins
+maven-failsafe-plugin
+
+  
+${project.build.directory}
+
true
+UTC
+  
+  
+
+
+  
+
+  integration-test
+  verify
+
+  
+
+  
+
+  
+
+  
+
+

Review comment:
   add newline?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7668: WIP ARROW-9187: [R] Add bindings for arithmetic and is_null kernels

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7668:
URL: https://github.com/apache/arrow/pull/7668#issuecomment-655124179


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


nealrichardson closed pull request #7665:
URL: https://github.com/apache/arrow/pull/7665


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655121582


   Revision: ad162848ecf90b691efa195cd53ce2787188dcb2
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-386](https://github.com/ursa-labs/crossbow/branches/all?query=actions-386)
   
   |Task|Status|
   ||--|
   
|wheel-manylinux1-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux1-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux1-cp35m)|
   
|wheel-manylinux1-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux1-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux1-cp36m)|
   
|wheel-manylinux1-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux1-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux1-cp37m)|
   
|wheel-manylinux1-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux1-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux1-cp38)|
   
|wheel-manylinux2010-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2010-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2010-cp35m)|
   
|wheel-manylinux2010-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2010-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2010-cp36m)|
   
|wheel-manylinux2010-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2010-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2010-cp37m)|
   
|wheel-manylinux2010-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2010-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2010-cp38)|
   
|wheel-manylinux2014-cp35m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2014-cp35m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2014-cp35m)|
   
|wheel-manylinux2014-cp36m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2014-cp36m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2014-cp36m)|
   
|wheel-manylinux2014-cp37m|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2014-cp37m)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2014-cp37m)|
   
|wheel-manylinux2014-cp38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-386-azure-wheel-manylinux2014-cp38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-386-azure-wheel-manylinux2014-cp38)|
   
|wheel-osx-cp35m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-386-travis-wheel-osx-cp35m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp36m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-386-travis-wheel-osx-cp36m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp37m|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-386-travis-wheel-osx-cp37m.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-osx-cp38|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-386-travis-wheel-osx-cp38.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|wheel-win-cp35m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-386-appveyor-wheel-win-cp35m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-386-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|
   

[GitHub] [arrow] kszucs commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


kszucs commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655120532


   @github-actions crossbow submit -g wheel



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7668: WIP ARROW-9187: [R] Add bindings for arithmetic and is_null kernels

2020-07-07 Thread GitBox


nealrichardson opened a new pull request #7668:
URL: https://github.com/apache/arrow/pull/7668


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()

2020-07-07 Thread GitBox


kou closed pull request #7652:
URL: https://github.com/apache/arrow/pull/7652


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()

2020-07-07 Thread GitBox


kou commented on pull request #7652:
URL: https://github.com/apache/arrow/pull/7652#issuecomment-655107335


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder

2020-07-07 Thread GitBox


kou closed pull request #7649:
URL: https://github.com/apache/arrow/pull/7649


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder

2020-07-07 Thread GitBox


kou commented on pull request #7649:
URL: https://github.com/apache/arrow/pull/7649#issuecomment-655105964


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7667: ARROW-9339: [Rust] Comments on SIMD in Arrow README are incorrect

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7667:
URL: https://github.com/apache/arrow/pull/7667#issuecomment-655105632


   https://issues.apache.org/jira/browse/ARROW-9339



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#issuecomment-655105633


   https://issues.apache.org/jira/browse/ARROW-8559



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran opened a new pull request #7667: ARROW-9339: [Rust] Comments on SIMD in Arrow README are incorrect

2020-07-07 Thread GitBox


paddyhoran opened a new pull request #7667:
URL: https://github.com/apache/arrow/pull/7667


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran opened a new pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


paddyhoran opened a new pull request #7666:
URL: https://github.com/apache/arrow/pull/7666


   Most libraries that use `Arrow` are likely to want to define types that 
produce `RecordBatch`'s.  This PR defines the `RecordBatchReader` and 
`SendableRecordBatchReader` traits and updates other crates (in particular, 
DataFusion) to use these traits.  In short, I think these traits should be 
defined in the main arrow crate.
   
   It was mentioned on the JIRA that we could just use a single trait and allow 
users to implement `Send`/`Sync` as necessary.  The fact the the `schema` 
method returns a reference counted type makes this hard to do (`Rc` vs `Arc`) 
so I settled on two different traits.  This seemed like the easiest to do in 
advance of 1.0, maybe we can improve on this.
   
   As `next` returns `Arrow`'s error type I added a new variant to allow 
wrapping external errors once they implement the standard `Error` trait and 
`Send`, `Sync`.  I remove the `PartialEq` and `Clone` bounds on the DataFusion 
error type as we were only using these in testing (the tests are updates) and 
it's best to keep the requirements on external implementers as low a possible.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] paddyhoran commented on pull request #7666: ARROW-8559: [Rust] Consolidate Record Batch reader traits in main arrow crate

2020-07-07 Thread GitBox


paddyhoran commented on pull request #7666:
URL: https://github.com/apache/arrow/pull/7666#issuecomment-655100102


   cc @houqp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7665:
URL: https://github.com/apache/arrow/pull/7665#issuecomment-655098668


   Revision: c939366716a7a8cb255222588c750263297da720
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-385](https://github.com/ursa-labs/crossbow/branches/all?query=actions-385)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-385-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655098011


   Revision: ad162848ecf90b691efa195cd53ce2787188dcb2
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-384](https://github.com/ursa-labs/crossbow/branches/all?query=actions-384)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp37m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-384-appveyor-wheel-win-cp37m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7665:
URL: https://github.com/apache/arrow/pull/7665#issuecomment-655097392


   @github-actions crossbow submit homebrew-cpp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


kszucs commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655096717


   @github-actions crossbow submit wheel-win-cp37m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


kszucs commented on a change in pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#discussion_r451037044



##
File path: cpp/cmake_modules/ThirdpartyToolchain.cmake
##
@@ -2069,7 +2069,7 @@ macro(build_utf8proc)
   "-DCMAKE_INSTALL_PREFIX=${UTF8PROC_PREFIX}"
   -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
   -DCMAKE_INSTALL_LIBDIR=lib
-  -DDBUILD_SHARED_LIBS=OFF)
+  -DBUILD_SHARED_LIBS=OFF)

Review comment:
   ~I think this was the original issue, because utf8proc wasn't available 
from conda, so it was bundled.~
   Nope, it's available from conda.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7665:
URL: https://github.com/apache/arrow/pull/7665#issuecomment-655069200


   https://issues.apache.org/jira/browse/ARROW-9360



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7665:
URL: https://github.com/apache/arrow/pull/7665#issuecomment-655068477


   Revision: 23b6f0e3d5ad52dda308cf13352e4fa7f4a8a3d5
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-383](https://github.com/ursa-labs/crossbow/branches/all?query=actions-383)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-383-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7665:
URL: https://github.com/apache/arrow/pull/7665#issuecomment-655067765


   @github-actions crossbow submit homebrew-cpp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #7665: ARROW-9360: [CI][Crossbow] Nightly homebrew-cpp job times out

2020-07-07 Thread GitBox


nealrichardson opened a new pull request #7665:
URL: https://github.com/apache/arrow/pull/7665


   After successfully building arrow, it then does
   
   ```
   ==> Upgrading 64 dependents:
   ansible 2.9.6_1 -> 2.9.10, ansible 2.9.6_1 -> 2.9.10, ansible 2.9.6_1 -> 
2.9.10, cairo 1.16.0_2 -> 1.16.0_3, cairo 1.16.0_2 -> 1.16.0_3, cairo 1.16.0_2 
-> 1.16.0_3, cairo 1.16.0_2 -> 1.16.0_3, cgal 5.0.2 -> 5.0.2_1, cgal 5.0.2 -> 
5.0.2_1, gdal 2.4.4_2 -> 3.1.1_1, gdal 2.4.4_2 -> 3.1.1_1, gdal 2.4.4_2 -> 
3.1.1_1, gdal 2.4.4_2 -> 3.1.1_1, gdal 2.4.4_2 -> 3.1.1_1, gdal 2.4.4_2 -> 
3.1.1_1, gdal 2.4.4_2 -> 3.1.1_1, gdal 2.4.4_2 -> 3.1.1_1, geos 3.8.1 -> 
3.8.1_1, geos 3.8.1 -> 3.8.1_1, geos 3.8.1 -> 3.8.1_1, glib 2.64.1 -> 2.64.3, 
glib 2.64.1 -> 2.64.3, glib 2.64.1 -> 2.64.3, glib 2.64.1 -> 2.64.3, gnutls 
3.6.12 -> 3.6.14, gnutls 3.6.12 -> 3.6.14, hdf5 1.12.0 -> 1.12.0_1, krb5 1.18 
-> 1.18.2, libdap 3.20.5 -> 3.20.6, libdap 3.20.5 -> 3.20.6, libdap 3.20.5 -> 
3.20.6, libpq 12.2_1 -> 12.3, libssh 0.9.3 -> 0.9.4, libxml2 2.9.10 -> 
2.9.10_1, libxml2 2.9.10 -> 2.9.10_1, libxml2 2.9.10 -> 2.9.10_1, mercurial 
5.3.1 -> 5.4.2, mercurial 5.3.1 -> 5.4.2, mercurial 5.3.1 -> 5.4.2, netcdf 
4.7.3_2 -> 4.7.4_1, node 13.12.0 -> 14.5.0, p11-kit 0.23.20 -> 0.23.20_1, 
poppler 0.86.1_1 -> 0.90.0, poppler 0.86.1_1 -> 0.90.0, poppler 0.86.1_1 -> 
0.90.0, poppler 0.86.1_1 -> 0.90.0, postgis 3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 
-> 3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, postgis 
3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, 
postgis 3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, postgis 3.0.1_1 -> 
3.0.1_2, postgis 3.0.1_1 -> 3.0.1_2, postgresql 12.2 -> 12.3_4, postgresql 12.2 
-> 12.3_4, protobuf-c 1.3.3 -> 1.3.3_1, pyenv 1.2.17 -> 1.2.19, sfcgal 1.3.7_2 
-> 1.3.8, sfcgal 1.3.7_2 -> 1.3.8, unbound 1.10.0 -> 1.10.1
   ```
   
   which times out. 
https://travis-ci.org/github/ursa-labs/crossbow/builds/705060949



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7664:
URL: https://github.com/apache/arrow/pull/7664#issuecomment-655061788


   https://issues.apache.org/jira/browse/ARROW-9265



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-07 Thread GitBox


pitrou commented on pull request #7664:
URL: https://github.com/apache/arrow/pull/7664#issuecomment-655061168


   Note that https://github.com/apache/arrow-testing/pull/35 needs to be merged 
first.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #7664: ARROW-9265: [C++] Allow writing and reading V4-compliant IPC data

2020-07-07 Thread GitBox


pitrou opened a new pull request #7664:
URL: https://github.com/apache/arrow/pull/7664


   V4 Union arrays with top-level null slots are disallowed, though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7660: ARROW-9291 [R]: Support fixed size binary/list types

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7660:
URL: https://github.com/apache/arrow/pull/7660#issuecomment-655054010


   Let me see if I understand the issues you're concerned with. We have some 
inconsistency in how types are converted: `raw` -> `uint8`, but `uint8` -> 
`integer`, for example. Here, `binary` -> `list_of(raw)`, and we've 
special-cased `list_of(raw)` -> `binary`, but you might think that it should be 
`list_of(uint8)` given how the non-list type is converted. 
   
   I'm fine with getting rid of the special-casing of `list_of(raw)` to 
`binary` and having the binary conversion set an R class (subclass of vctrs 
list of or whatever), which we use to determine that the list of raw should be 
converted back to BinaryArray. I'm not particularly worried that someone would 
have a list of raw vectors in R and be frustrated that that wasn't converted to 
a list of uint8--seems unlikely, and if you really care, you can always specify 
a type/schema. But it seems reasonable to be explicit and set a class attribute.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


pitrou commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655036007


   I am talking about the kernel name as well. "unicode" is non-descriptive.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox


nealrichardson closed pull request #7648:
URL: https://github.com/apache/arrow/pull/7648


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7648:
URL: https://github.com/apache/arrow/pull/7648#issuecomment-655026342


   Ok then I'll merge this and we can come back and improve it later.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] healiseu opened a new issue #7663: Sorting on pyarrow data structures ?

2020-07-07 Thread GitBox


healiseu opened a new issue #7663:
URL: https://github.com/apache/arrow/issues/7663


   Hi, I consider sorting a fundamental operation for any in-memory data 
structures, including those of PyArrow.  
   
   It would be nice if pa.array, pa.table, etc had sorting methods but I did 
not find any. One has to pass sorting indices calculated from some other 
library, such as numpy, to sort them. Sorting indices could have been 
calculated directly from PyArrow. Am I missing something here ? That increases 
significantly complexity for the developer.
   
   Do you have any plans on implementing such a feature in the near future ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655020811


   Revision: 5970625fa8324681167e202beaf45ea1a24fc674
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-382](https://github.com/ursa-labs/crossbow/branches/all?query=actions-382)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp37m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-382-appveyor-wheel-win-cp37m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


kszucs commented on a change in pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#discussion_r451037044



##
File path: cpp/cmake_modules/ThirdpartyToolchain.cmake
##
@@ -2069,7 +2069,7 @@ macro(build_utf8proc)
   "-DCMAKE_INSTALL_PREFIX=${UTF8PROC_PREFIX}"
   -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
   -DCMAKE_INSTALL_LIBDIR=lib
-  -DDBUILD_SHARED_LIBS=OFF)
+  -DBUILD_SHARED_LIBS=OFF)

Review comment:
   I think this was the original issue, because utf8proc wasn't available 
from conda, so it was bundled.
   
   Need to run other packaging builds as well if it works.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7658: ARROW-9305: [Python] Dependency load failure in Windows wheel build

2020-07-07 Thread GitBox


kszucs commented on pull request #7658:
URL: https://github.com/apache/arrow/pull/7658#issuecomment-655019861


   @github-actions crossbow submit wheel-win-cp37m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] sunchao commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec

2020-07-07 Thread GitBox


sunchao commented on pull request #7176:
URL: https://github.com/apache/arrow/pull/7176#issuecomment-655010984


   Sorry for the late response. Yes agree that API compatibility is not a big 
concern here. I think the change is useful and my only concern is the new 
constraint on how writer is going to be used. We can work on improving that 
later though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7645:
URL: https://github.com/apache/arrow/pull/7645#issuecomment-655008360


   Re: ChunkedArray print method, `git blame` says it was introduced in #5492. 
I would guess that I added a custom method so that the printing wouldn't 
explode off the screen if you have a big array. Or maybe so that the internal 
chunking details weren't exposed since that's not always helpful. Ok with me if 
you want to change it, but I wouldn't unify the dictionaries in the print 
method--if you're trying to show more about the internals of what's in the 
array, show what's actually there.
   
   Re: dictionariesAsFactors, that's ARROW-7657, fine to keep it out of scope 
here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


maartenbreddels commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655006474


   I am not talking about the C++ code, only the kernel name. UTF8 is just the 
encoding, Unicode refers to the semantics of the kernel. But if utf16/32 if 
excluded, I guess we can keep the old naming scheme.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7662: ARROW-9347: [Python] Fix mv in fsspec handler for directories

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7662:
URL: https://github.com/apache/arrow/pull/7662#issuecomment-655006163


   https://issues.apache.org/jira/browse/ARROW-9347



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox


wesm commented on pull request #7648:
URL: https://github.com/apache/arrow/pull/7648#issuecomment-655004137


   As long as the implementation details / semantics aren't exposed (they don't 
seem to be), this seems sufficient to me to have the feature established, and 
we an always return later and make it more efficient (or replace it with the 
iterator mechanism that will hopefully have some discussion on the ML)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


pitrou commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655003555


   > Are there plans to support utf16/32?
   
   I think that should be out of scope. There's no Arrow data type for utf16 
and utf32.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche opened a new pull request #7662: ARROW-9347: [Python] Fix mv in fsspec handler for directories

2020-07-07 Thread GitBox


jorisvandenbossche opened a new pull request #7662:
URL: https://github.com/apache/arrow/pull/7662


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


pitrou commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655003292


   "utf8" is used everywhere, please keep it like this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


maartenbreddels commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655001506


   If it is not off the table, I propose at least utf8->unicode renaming, to be 
future compatible.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


xhochy commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-655000427


   > Are there plans to support utf16/32? Seeing the code as it is now, it 
would be trivial to add.
   
   This would be a longer discussion. Personally I would vote "no" to limit the 
scope / dimensionality of Arrow's memory model for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


maartenbreddels commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654999617


   Are there plans to support utf16/32? Seeing the code as it is now, it would 
be trivial to add.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


pitrou commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654999442


   We should be consistent. We already have "utf8_lower" and "utf8_upper".



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7648:
URL: https://github.com/apache/arrow/pull/7648#issuecomment-654999245


   @wesm do you have an opinion on this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


wesm commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654998304


   I think that having e.g. `string_lower_utf8` and `string_lower_ascii` is 
fine, too. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] xhochy commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


xhochy commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654996646


   I'm indifferent to `utf8_` vs `string_` (sometimes our codebase is too) but 
other than that I fully agree to @wesm naming scheme. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


wesm commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654994900


   OK, I would also like to solicit @xhochy's and @pitrou's preferences in the 
function naming. I don't want to spend too much time on it, but as long as we 
have a consistent naming scheme that accommodates the anticipated use cases, 
that sounds fine to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] maartenbreddels commented on pull request #7656: ARROW-9268: [C++] add string_is{alpnum,alpha...,upper} kernels

2020-07-07 Thread GitBox


maartenbreddels commented on pull request #7656:
URL: https://github.com/apache/arrow/pull/7656#issuecomment-654991684


   I wanted to focus on the other issues first, but I propose a change in 
naming by this:
`__`, since all you care about is that it's a 
function that operates on a string and all the ascii variants with with all 
utf8 strings. Also, I don't know if in the future there will be utf16/utf32 
support, in that case, this naming scheme makes more sense. I'm ok changing it 
back or open a PR to make this consistent (`string_lower_ascii`, 
`string_lower_uncode`, ... etc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-07 Thread GitBox


jorisvandenbossche commented on pull request #7623:
URL: https://github.com/apache/arrow/pull/7623#issuecomment-654981357


   I think @bkietz is still taking a look today?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450989955



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null_test.cc
##
@@ -0,0 +1,137 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_binary.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/memory_pool.h"
+#include "arrow/result.h"
+#include "arrow/status.h"
+#include "arrow/testing/gtest_compat.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+
+namespace arrow {
+namespace compute {
+
+template ::c_type>
+void CheckFillNull(const std::shared_ptr& type, const 
std::vector& in_values,
+   const std::vector& in_is_valid, const Datum 
fill_value,
+   const std::vector& out_values,
+   const std::vector& out_is_valid) {
+  std::shared_ptr input = _MakeArray(type, in_values, 
in_is_valid);
+  std::shared_ptr expected = _MakeArray(type, out_values, 
out_is_valid);
+
+  ASSERT_OK_AND_ASSIGN(Datum datum_out, FillNull(input, fill_value));
+  std::shared_ptr result = datum_out.make_array();
+  ASSERT_OK(result->ValidateFull());
+  AssertArraysEqual(*expected, *result, /*verbose=*/true);
+}
+
+class TestFillNullKernel : public ::testing::Test {};
+
+template 
+class TestFillNullPrimitive : public ::testing::Test {};
+
+typedef ::testing::Types
+PrimitiveTypes;

Review comment:
   I see them declared a couple of times in the code base as 
PrimitiveDictionaries eg in `scalar_set_lookup_test.cc` - should I use that 
instead?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450987544



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null_test.cc
##
@@ -0,0 +1,137 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_binary.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/memory_pool.h"
+#include "arrow/result.h"
+#include "arrow/status.h"
+#include "arrow/testing/gtest_compat.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+
+namespace arrow {
+namespace compute {
+
+template ::c_type>
+void CheckFillNull(const std::shared_ptr& type, const 
std::vector& in_values,
+   const std::vector& in_is_valid, const Datum 
fill_value,
+   const std::vector& out_values,
+   const std::vector& out_is_valid) {
+  std::shared_ptr input = _MakeArray(type, in_values, 
in_is_valid);
+  std::shared_ptr expected = _MakeArray(type, out_values, 
out_is_valid);
+
+  ASSERT_OK_AND_ASSIGN(Datum datum_out, FillNull(input, fill_value));
+  std::shared_ptr result = datum_out.make_array();
+  ASSERT_OK(result->ValidateFull());
+  AssertArraysEqual(*expected, *result, /*verbose=*/true);
+}

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450987676



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null_test.cc
##
@@ -0,0 +1,137 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_binary.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/memory_pool.h"
+#include "arrow/result.h"
+#include "arrow/status.h"
+#include "arrow/testing/gtest_compat.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/type.h"
+#include "arrow/type_traits.h"
+
+namespace arrow {
+namespace compute {
+
+template ::c_type>
+void CheckFillNull(const std::shared_ptr& type, const 
std::vector& in_values,
+   const std::vector& in_is_valid, const Datum 
fill_value,
+   const std::vector& out_values,
+   const std::vector& out_is_valid) {
+  std::shared_ptr input = _MakeArray(type, in_values, 
in_is_valid);
+  std::shared_ptr expected = _MakeArray(type, out_values, 
out_is_valid);
+
+  ASSERT_OK_AND_ASSIGN(Datum datum_out, FillNull(input, fill_value));
+  std::shared_ptr result = datum_out.make_array();
+  ASSERT_OK(result->ValidateFull());
+  AssertArraysEqual(*expected, *result, /*verbose=*/true);
+}
+
+class TestFillNullKernel : public ::testing::Test {};
+
+template 
+class TestFillNullPrimitive : public ::testing::Test {};
+
+typedef ::testing::Types
+PrimitiveTypes;
+
+TYPED_TEST_SUITE(TestFillNullPrimitive, PrimitiveTypes);
+
+TYPED_TEST(TestFillNullPrimitive, FillNull) {
+  using T = typename TypeParam::c_type;
+  using ScalarType = typename TypeTraits::ScalarType;
+  auto type = TypeTraits::type_singleton();
+  auto scalar = std::make_shared(static_cast(5));
+  // No Nulls
+  CheckFillNull(type, {2, 4, 7, 9}, {true, true, true, true}, 
Datum(scalar),
+  {2, 4, 7, 9}, {true, true, true, true});
+  // Some Nulls
+  CheckFillNull(type, {2, 4, 7, 8}, {false, true, false, true},
+  Datum(scalar), {5, 4, 5, 8}, {true, true, true, 
true});
+  // Empty Array
+  CheckFillNull(type, {}, {}, Datum(scalar), {}, {});
+}
+
+TEST_F(TestFillNullKernel, FillNullNull) {
+  auto datum = Datum(std::make_shared());
+  CheckFillNull(null(), {0, 0, 0, 0},
+  {false, false, false, false}, datum,
+  {0, 0, 0, 0}, {false, false, false, 
false});
+  CheckFillNull(null(), {NULL, NULL, NULL, NULL}, 
{}, datum,
+  {NULL, NULL, NULL, NULL}, {});
+  CheckFillNull(null(), {0, 0, 0, 0},
+  {false, false, false, false}, datum,
+  {0, 0, 0, 0}, {false, false, false, 
false});
+  CheckFillNull(null(), {NULL, NULL, NULL, NULL}, 
{}, datum,
+  {NULL, NULL, NULL, NULL}, {});
+}
+
+TEST_F(TestFillNullKernel, FillNullBoolean) {
+  auto scalar1 = std::make_shared(false);
+  auto scalar2 = std::make_shared(true);
+  // no nulls
+  CheckFillNull(boolean(), {true, false, true, false},
+   {true, true, true, true}, Datum(scalar1),
+   {true, false, true, false}, {true, true, 
true, true});
+  // some nulls
+  CheckFillNull(boolean(), {true, false, true, false},
+   {false, true, true, false}, Datum(scalar1),
+   {false, false, true, false}, {true, true, 
true, true});
+  CheckFillNull(boolean(), {true, false, true, false},
+   {false, true, false, false}, Datum(scalar2),
+   {true, false, true, true}, {true, true, 
true, true});
+}
+
+TEST_F(TestFillNullKernel, FillNullTimeStamp) {
+  auto time32_type = time32(TimeUnit::SECOND);
+  auto time64_type = time64(TimeUnit::NANO);
+  auto scalar1 = 

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450986992



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc
##
@@ -0,0 +1,223 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/common.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_writer.h"
+#include "arrow/visitor_inline.h"
+
+namespace arrow {
+
+namespace compute {
+namespace internal {
+namespace {
+
+template 
+using enable_if_supports_fill_null = enable_if_t::value, R>;
+
+template 
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool* pool) {}
+
+  Status Init(const FillNullOptions& options) {
+fill_value = options.fill_value.scalar();
+return Status::OK();
+  }
+
+  std::shared_ptr fill_value;
+};
+
+template <>
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool*) {}
+
+  Status Init(const FillNullOptions& options) { return Status::OK(); }
+
+  std::shared_ptr fill_value;
+};
+
+struct InitFillNullStateVisitor {
+  KernelContext* ctx;
+  const FillNullOptions* options;
+  std::unique_ptr result;
+
+  InitFillNullStateVisitor(KernelContext* ctx, const FillNullOptions* options)
+  : ctx(ctx), options(options) {}
+
+  template 
+  Status Init() {
+using StateType = FillNullState;
+result.reset(new StateType(ctx->exec_context()->memory_pool()));
+return static_cast(result.get())->Init(*options);
+  }
+
+  Status Visit(const DataType&) { return Init(); }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+return Init();
+  }
+
+  Status GetResult(std::unique_ptr* out) {
+RETURN_NOT_OK(VisitTypeInline(*options->fill_value.type(), this));
+*out = std::move(result);
+return Status::OK();
+  }
+};
+
+std::unique_ptr InitFillNull(KernelContext* ctx,
+  const KernelInitArgs& args) {
+  InitFillNullStateVisitor visitor{ctx,
+   static_cast(args.options)};
+  std::unique_ptr result;
+  ctx->SetStatus(visitor.GetResult());
+  return result;
+}
+
+struct ScalarFillVisitor {
+  KernelContext* ctx;
+  const ArrayData& data;
+  Datum* out;
+
+  ScalarFillVisitor(KernelContext* ctx, const ArrayData& data, Datum* out)
+  : ctx(ctx), data(data), out(out) {}
+
+  Status Visit(const DataType&) {
+ArrayData* out_arr = out->mutable_array();
+*out_arr = data;
+return Status::OK();
+  }
+
+  Status Visit(const BooleanType&) {
+const auto& state = checked_cast&>(*ctx->state());
+bool value = UnboxScalar::Unbox(*state.fill_value);
+ArrayData* out_arr = out->mutable_array();
+FirstTimeBitmapWriter bit_writer(out_arr->buffers[1]->mutable_data(), 
out_arr->offset,
+ out_arr->length);
+FirstTimeBitmapWriter 
bit_writer_validity(out_arr->buffers[0]->mutable_data(),
+  out_arr->offset, 
out_arr->length);
+if (data.null_count != 0) {
+  BitmapReader bit_reader(data.buffers[1]->data(), data.offset, 
data.length);
+  BitmapReader bit_reader_validity(data.buffers[0]->data(), data.offset, 
data.length);
+  for (int64_t i = 0; i < data.length; i++) {
+if (bit_reader_validity.IsNotSet()) {
+  if (value == true) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+} else {
+  if (bit_reader.IsSet()) {
+bit_writer.Set();
+  } else {
+bit_writer.Clear();
+  }
+  bit_writer_validity.Set();
+}
+bit_reader.Next();
+bit_writer.Next();
+bit_reader_validity.Next();
+bit_writer_validity.Next();
+  }
+  bit_writer_validity.Finish();
+  bit_writer.Finish();
+} else {
+  *out_arr = data;
+}
+return Status::OK();
+  }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+using T = typename GetViewType::T;
+const 

[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450986751



##
File path: cpp/src/arrow/compute/kernels/scalar_fill_null.cc
##
@@ -0,0 +1,223 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/array/array_base.h"
+#include "arrow/array/builder_primitive.h"
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/common.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/bitmap_writer.h"
+#include "arrow/visitor_inline.h"
+
+namespace arrow {
+
+namespace compute {
+namespace internal {
+namespace {
+
+template 
+using enable_if_supports_fill_null = enable_if_t::value, R>;
+
+template 
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool* pool) {}
+
+  Status Init(const FillNullOptions& options) {
+fill_value = options.fill_value.scalar();
+return Status::OK();
+  }
+
+  std::shared_ptr fill_value;
+};
+
+template <>
+struct FillNullState : public KernelState {
+  explicit FillNullState(MemoryPool*) {}
+
+  Status Init(const FillNullOptions& options) { return Status::OK(); }
+
+  std::shared_ptr fill_value;
+};
+
+struct InitFillNullStateVisitor {
+  KernelContext* ctx;
+  const FillNullOptions* options;
+  std::unique_ptr result;
+
+  InitFillNullStateVisitor(KernelContext* ctx, const FillNullOptions* options)
+  : ctx(ctx), options(options) {}
+
+  template 
+  Status Init() {
+using StateType = FillNullState;
+result.reset(new StateType(ctx->exec_context()->memory_pool()));
+return static_cast(result.get())->Init(*options);
+  }
+
+  Status Visit(const DataType&) { return Init(); }
+
+  template 
+  enable_if_supports_fill_null Visit(const Type&) {
+return Init();
+  }
+
+  Status GetResult(std::unique_ptr* out) {
+RETURN_NOT_OK(VisitTypeInline(*options->fill_value.type(), this));
+*out = std::move(result);
+return Status::OK();
+  }
+};
+
+std::unique_ptr InitFillNull(KernelContext* ctx,
+  const KernelInitArgs& args) {
+  InitFillNullStateVisitor visitor{ctx,
+   static_cast(args.options)};
+  std::unique_ptr result;
+  ctx->SetStatus(visitor.GetResult());
+  return result;
+}
+
+struct ScalarFillVisitor {
+  KernelContext* ctx;
+  const ArrayData& data;
+  Datum* out;
+
+  ScalarFillVisitor(KernelContext* ctx, const ArrayData& data, Datum* out)
+  : ctx(ctx), data(data), out(out) {}
+
+  Status Visit(const DataType&) {
+ArrayData* out_arr = out->mutable_array();
+*out_arr = data;
+return Status::OK();
+  }
+
+  Status Visit(const BooleanType&) {
+const auto& state = checked_cast&>(*ctx->state());
+bool value = UnboxScalar::Unbox(*state.fill_value);
+ArrayData* out_arr = out->mutable_array();
+FirstTimeBitmapWriter bit_writer(out_arr->buffers[1]->mutable_data(), 
out_arr->offset,
+ out_arr->length);
+FirstTimeBitmapWriter 
bit_writer_validity(out_arr->buffers[0]->mutable_data(),
+  out_arr->offset, 
out_arr->length);

Review comment:
   I have reworked this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] c-jamie commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null

2020-07-07 Thread GitBox


c-jamie commented on a change in pull request #7635:
URL: https://github.com/apache/arrow/pull/7635#discussion_r450986449



##
File path: cpp/src/arrow/compute/api_scalar.cc
##
@@ -126,5 +126,24 @@ Result Compare(const Datum& left, const Datum& 
right, CompareOptions opti
 SCALAR_EAGER_UNARY(IsValid, "is_valid")
 SCALAR_EAGER_UNARY(IsNull, "is_null")
 
+Result FillNull(const Datum& values, const Datum& fill_value, 
ExecContext* ctx) {
+  if (!values.is_arraylike()) {
+return Status::Invalid("Values must be Array or ChunkedArray");
+  }
+
+  if (!fill_value.is_scalar()) {
+return Status::Invalid("fill value must be a scalar");
+  }
+
+  if (!values.type()->Equals(fill_value.type())) {
+std::stringstream ss;
+ss << "Array type didn't match type of fill value: " << 
values.type()->ToString()
+   << " vs " << fill_value.type()->ToString();
+return Status::Invalid(ss.str());
+  }

Review comment:
   This has been reworked





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7661: ARROW-9020: [Python] read_json won't respect explicit_schema in parse_options

2020-07-07 Thread GitBox


github-actions[bot] commented on pull request #7661:
URL: https://github.com/apache/arrow/pull/7661#issuecomment-654969459


   https://issues.apache.org/jira/browse/ARROW-9020



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7659: ARROW-9287: [C++] Support unsigned dictionary indices

2020-07-07 Thread GitBox


wesm commented on pull request #7659:
URL: https://github.com/apache/arrow/pull/7659#issuecomment-654964007


   @kszucs done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-07 Thread GitBox


nealrichardson commented on pull request #7623:
URL: https://github.com/apache/arrow/pull/7623#issuecomment-654963305


   CI is green. @jorisvandenbossche is this good to merge?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >