[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359


   I've started a discussion on the 
[mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make 
other people aware of your efforts.
   
   I wonder if creating a `LETypedBufferBuilder` would make more sense than 
adding `AppendLE` methods (I don't think it makes sense to use two different 
endiannesses in a single buffer). It should probably be discussed on the ML.
   
   As for `Serialize`, I can't really tell you. Parquet encoding routines seem 
a bit all over the place (most of it happens in `encoding.cc`). Perhaps other 
developers can chime in...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616667220


   https://issues.apache.org/jira/browse/ARROW-8477



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616759329


   Thank you for staring the discussion. I will watch at the thread.
   
   Yeah, `LETypedBufferBuilder` makes sense. It looks better than adding 
`AppendLE`.
   
   Regarding `Serialize`, it looks a good place where the class has both types 
of an element for Arrow and Parquet. But, encoding (i.e RLE and others) happens 
in `encoding.cc`. Let me check it tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


vertexclique commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411501620



##
File path: rust/arrow/src/array/builder.rs
##
@@ -236,6 +251,14 @@ impl BufferBuilderTrait for 
BufferBuilder {
 self.write_bytes(v.to_byte_slice(), 1)
 }
 
+default fn append_n( self, n: usize, v: T::Native) -> Result<()> {
+self.reserve(n)?;
+for _ in 0..n {
+self.write_bytes(v.to_byte_slice(), 1)?;
+}

Review comment:
   Any preference to not to use an iterator here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


kiszk commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411585778



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   For Parquet use case, the max width of the index is 32 based on 
https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142


   Your changes look good.  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-20 Thread GitBox


BryanCutler commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r411513297



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java
##
@@ -34,31 +33,24 @@
   static final UnsafeDirectLittleEndian EMPTY = INNER_ALLOCATOR.empty;
   static final long CHUNK_SIZE = INNER_ALLOCATOR.getChunkSize();
 
-  private final int allocatedSize;
-  private final UnsafeDirectLittleEndian memoryChunk;
+  private final long allocatedSize;
 
-  NettyAllocationManager(BaseAllocator accountingAllocator, int requestedSize) 
{
-super(accountingAllocator);
-this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize);

Review comment:
   I don't think we should remove this which effectively replaces all 
allocations done in Arrow Java, which is a big change. `INNER_ALLOCATOR` also 
uses a pool which has some benefits. Instead, can you just change 
`requestedSize` to be a long, then check if it is over the max Int size and 
only then use `PlatformDependent.allocateMemory`?

##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.

Review comment:
   please update

##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.
+ *   This can be achieved by the environmental variable or system property.
+ *   The details can be found in {@link DefaultAllocationManagerOption}.
+ * 
+ */
+public class TestLargeArrowBuf {
+
+  private static void testLargeArrowBuf() {
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ ArrowBuf largeBuf = allocator.buffer(bufSize)) {
+  assertEquals(bufSize, largeBuf.capacity());
+  System.out.println("Successfully allocated a buffer with capacity " + 
largeBuf.capacity());
+
+  for (long i = 0; i < bufSize / 8; i++) {
+largeBuf.setLong(i * 8, i);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " long 
words");
+}
+  }
+  System.out.println("Successfully written " + (bufSize / 8) + " long 
words");
+
+  for (long i = 0; i < bufSize / 8; i++) {
+long val = largeBuf.getLong(i * 8);
+assertEquals(i, val);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully read " + (i + 1) + " long words");
+}
+  }
+  System.out.println("Successfully read " + (bufSize / 8) + " long words");
+}
+

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616703869


   Looks like Windows long paths are enabled by default on Github Actions. Cool!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616748578


   The remaining CI failure is unrelated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411569025



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   I experienced failures in the CI pipeline on macOS and I was unable to 
locate clear documentation of the path name limits (I'm not a macOS expert). 
   
   I figured it might be best to separately address macOS in another issue, if 
there's community support for that.  (This patch leaves macOS builds and tests 
unchanged.)
   
   If you believe this test should run as is under macOS, I'll remove the 
#ifndef and follow up on any issues.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau opened a new pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau opened a new pull request #6993:
URL: https://github.com/apache/arrow/pull/6993


   This patch enables reading/writing of files with long (>260 characters) 
pathnames in Windows.
   
   In order for the new test to run under Windows, both (1) the test host must 
have long paths enabled in its registry, and (2) the test executable 
(arrow_utility_test.exe) must include a manifest indicating support for long 
paths (see 
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#enable-long-paths-in-windows-10-version-1607-and-later).
  The test source code checks for (1) and the cmake file changes ensure (2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411547763



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   Why did you have to disable this test on macOS?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411570280



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   I was just wondering. According to Google searches, the path length 
limit on macOS may be 1024, which this test exceeds. We can keep it like that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau edited a comment on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau edited a comment on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142


   Your fixups look good.  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


emkornfield commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411577540



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   might be worth a comment here and below why IndexType is always static.  
Would it be possible to add a unit test with a larger value that would have 
failed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


ursabot commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616716254


   [AMD64 Conda Crossbow Submit 
(#101910)](https://ci.ursalabs.org/#builders/98/builds/641) builder has been 
succeeded.
   
   Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
ursabot-572](https://github.com/ursa-labs/crossbow/branches/all?query=ursabot-572)
   
   |Task|Status|
   ||--|
   
|gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


pprudhvi commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616715997


   @ursabot crossbow submit -g gandiva



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411578282



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   You mean a larger type, or an index larger than 2**32?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411675427



##
File path: rust/arrow/src/array/builder.rs
##
@@ -236,6 +251,14 @@ impl BufferBuilderTrait for 
BufferBuilder {
 self.write_bytes(v.to_byte_slice(), 1)
 }
 
+default fn append_n( self, n: usize, v: T::Native) -> Result<()> {
+self.reserve(n)?;
+for _ in 0..n {
+self.write_bytes(v.to_byte_slice(), 1)?;
+}

Review comment:
   I'm not sure I understand what you mean?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


bkietz commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616819386


   @github-actions crossbow submit -g nightly



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz opened a new pull request #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


bkietz opened a new pull request #6994:
URL: https://github.com/apache/arrow/pull/6994


   Add a `status.json` to the gh-pages summary of nightly builds to get around 
rate limiting



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887183


   @github-actions crossbow submit test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820980


   https://issues.apache.org/jira/browse/ARROW-8043



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


kou commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616819824


   Wow! Awesome!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820253


   Revision: 89cf7325ab761a35b0c8a0da7096805984e18435
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-156](https://github.com/ursa-labs/crossbow/branches/all?query=actions-156)
   
   |Task|Status|
   ||--|
   |centos-6-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-6-amd64)|
   |centos-7-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-7-amd64)|
   |centos-8-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-8-amd64)|
   
|conda-linux-gcc-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py36)|
   
|conda-linux-gcc-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py37)|
   
|conda-linux-gcc-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py38)|
   
|conda-osx-clang-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py36)|
   
|conda-osx-clang-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py37)|
   
|conda-osx-clang-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py38)|
   
|conda-win-vs2015-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py36)|
   
|conda-win-vs2015-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py37)|
   
|conda-win-vs2015-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py38)|
   |debian-buster-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-buster-amd64)|
   |debian-stretch-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-stretch-amd64)|
   
|gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|homebrew-r-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-r-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |test-conda-cpp|[![Github 

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox


kou commented on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616869503


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881056


   @github-actions crossbow submit test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881426


   Revision: 1ed83aaf5dd17d4e3b31aa1cc657f1220da2c8d4
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-157](https://github.com/ursa-labs/crossbow/branches/all?query=actions-157)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-157-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-157-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson opened a new pull request #6995:
URL: https://github.com/apache/arrow/pull/6995


   Having some trouble/slowness with r-hub for testing so made this PR to use 
crossbow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887542


   Revision: 88c0198d775796d5a39644a22840a45470b4253f
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-158](https://github.com/ursa-labs/crossbow/branches/all?query=actions-158)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-158-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-158-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669


   Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927551


   Revision: 69081241244da5decee0bf0ea3cb2f24059d244d
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-159](https://github.com/ursa-labs/crossbow/branches/all?query=actions-159)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-159-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


nealrichardson commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927211


   @github-actions crossbow submit homebrew-cpp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


nealrichardson opened a new pull request #6996:
URL: https://github.com/apache/arrow/pull/6996


   One more I didn't remove in ARROW-8222.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616931067


   https://issues.apache.org/jira/browse/ARROW-8538



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079


   @pitrou @wesm 
   Oops, I only checked case "BitmapReader" from benchmark 
[arrow-bit-util-benchmark](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_benchmark.cc).
 Obviously it's not enough.
   
   I compared all cases just now and see huge performance drop from below 4 
tests:
   
   Before this patch:
   ```bash
   BenchmarkBitmapAnd/32768/1 563496 ns   563260 ns 
1243 bytes_per_second=55.4806M/s
   BenchmarkBitmapAnd/131072/1   2219810 ns  2218984 ns 
 318 bytes_per_second=56.3321M/s
   BenchmarkBitmapAnd/32768/2 561738 ns   561467 ns 
1265 bytes_per_second=55.6577M/s
   BenchmarkBitmapAnd/131072/2   2246229 ns  2245119 ns 
 305 bytes_per_second=55.6763M/s
   ```
   
   After this patch:
   ```bash
   BenchmarkBitmapAnd/32768/11653467 ns  1652680 ns 
 422 bytes_per_second=18.9087M/s
   BenchmarkBitmapAnd/131072/1   6665501 ns  6661561 ns 
 105 bytes_per_second=18.7644M/s
   BenchmarkBitmapAnd/32768/21670793 ns  1670246 ns 
 423 bytes_per_second=18.7098M/s
   BenchmarkBitmapAnd/131072/2   6702369 ns  6698957 ns 
 103 bytes_per_second=18.6596M/s
   ```
   
   Before reverting this patch, I would like to understand why it happens.
   
   BTW: we definitely need continuous benchmark tools to detect these things 
early.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385


   I have been thinking about place candidates of the interface between the 
native endian and a PARQUET little-endian. 
   
   One of the good candidates is `Serialize()` in `parquet/column_writer.cc`.  
Another candidate is `TypedBufferBuilder` in `arrow/buffer_builder.h`.
   
   Regarding `Serialize()`, this is because there is [a conversion 
loop](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1781-1783)
 for Decimal128 that uses BigEndian. For big-endian, `Serialize()` of other 
primitive types including int96 needs to have such as conversion loop to 
little-endian. This is the first step.
   
   While the above approach leads to additional overhead, it would be good to 
have new methods
   `AppendLE` and `UnsafeAppendLE` in `TypedBufferBuilder` in addition to 
[`Append()` and 
`UnsafeAppend`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L204-L240].
 These new method ensures to write typed data in little-endian.
   
   I think that we can support big-endian in Parquet using a two-step approach. 
What do you think?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6989:
URL: https://github.com/apache/arrow/pull/6989#issuecomment-616395745


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-20 Thread GitBox


jorisvandenbossche commented on issue #6970:
URL: https://github.com/apache/arrow/pull/6970#issuecomment-616366094


   Should we document this in the slice docstring that if the step is not 1, it 
will be a copy (take) and not a zero-copy view? (as I think people will 
typically assume no copy when slicing)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold edited a comment on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold edited a comment on issue #6980:
URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880


   I built the docker image locally and ran the same script as the CI, however, 
I am unable to reproduce the linker error... The ursabot issue seems to have 
fixed itself though, which is good I guess, but I'm probably going to need a 
hand with diagnosing the debian CI issue



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899


   Hmm, I don't think that's right. `Int96` is the physical representation of 
96-bit integers in Parquet files, and it's entirely little-endian. This means 
it should always have the same bit-representation, regardless of the platform's 
endianness.
   
   I think there are several places that need to be fixed:
   * the `Int96` tests in `parquet/arrow/arrow_reader_writer_test.cc`
   * the various `ToImpalaTimestamp` conversion functions in 
`parquet/column_writer.h`
   * the various `Int96` helper functions in `parquet/types.h`
   
   (I may be missing one or more)
   
   Note that `Int96` types are deprecated, so you may not want to lose your 
sweat over this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411174949



##
File path: rust/arrow/src/array/builder.rs
##
@@ -301,6 +324,21 @@ impl BufferBuilderTrait for 
BufferBuilder {
 Ok(())
 }
 
+fn append_n( self, n: usize, v: bool) -> Result<()> {
+self.reserve(n)?;
+if v {
+unsafe {
+bit_util::set_bits_raw(
+self.buffer.raw_data() as *mut u8,

Review comment:
   Changed, and fixed the others in the same file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616358473


   https://issues.apache.org/jira/browse/ARROW-8524



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on issue #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox


jorisvandenbossche commented on issue #6961:
URL: https://github.com/apache/arrow/pull/6961#issuecomment-616389505


   > wheels-linux: 3.8 has a test failure (test_construct_from_list_of_files); 
François says he's seen this elsewhere. @jorisvandenbossche @kszucs is this 
another non-deterministic dataset test?
   
   @nealrichardson looks like it, yes (this was a newly introduced test) -> 
https://github.com/apache/arrow/pull/6989



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche opened a new pull request #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox


jorisvandenbossche opened a new pull request #6989:
URL: https://github.com/apache/arrow/pull/6989


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on issue #6980:
URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880


   I built the docker image locally and ran the same script as the CI, however, 
I am unable to reproduce the linker error... The ursabot issue seems to have 
fixed itself though, which is good I guess, but I'm probably going to need a 
hand with this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…

2020-04-20 Thread GitBox


zhztheplayer commented on issue #6967:
URL: https://github.com/apache/arrow/pull/6967#issuecomment-616345112


   OK but as the PR is already merged, maybe a follow-up JIRA ticket is needed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


kszucs commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616356603


   @github-actions crossbow submit ubuntu-bionic-amd64 test-conda-cpp 
test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616334547


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #6988: [CI] Try to free up space on github actions [WIP]

2020-04-20 Thread GitBox


kszucs opened a new pull request #6988:
URL: https://github.com/apache/arrow/pull/6988


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616357456


   Revision: f19af84b7b6af216b91a56956590ebce051b69c7
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-155](https://github.com/ursa-labs/crossbow/branches/all?query=actions-155)
   
   |Task|Status|
   ||--|
   |test-conda-cpp|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-test-conda-cpp)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-test-conda-cpp)|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-test-r-linux-as-cran)|
   |ubuntu-bionic-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-ubuntu-bionic-amd64)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk edited a comment on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606


   Thank you for your clarification. Based on this, `Int96` structure in memory 
can be represented in native endian. When it will be written into a file, we 
carefully have to keep it in a little-endian.
   
   ~~I just realized it that int32 in [int64, int32] is still a little-endian. 
After one or two hours, I will update this PR.~~ I will be back to here after 
one hour.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox


kszucs commented on a change in pull request #6961:
URL: https://github.com/apache/arrow/pull/6961#discussion_r411236556



##
File path: dev/tasks/verify-rc/github.nix.yml
##
@@ -64,8 +69,9 @@ jobs:
 fi
 if [ "$TEST_RUBY" = "1" ]; then
   ruby --version
+  sudo gem install bundler
 fi
   fi
-  # TODO: put 0.16.0 2 in some separate file?
+  # TODO: put version and rc number in some separate file?

Review comment:
   Not yet, created jira https://issues.apache.org/jira/browse/ARROW-8525





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


pitrou commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493


   @cyb70289 It's ok, we can keep this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox


kszucs commented on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081


   @kou we don't need to move to travis, I [managed to free up 
23GB(https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd)
 of additional space on github actions, and we can free up more if required.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs edited a comment on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox


kszucs edited a comment on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081


   @kou we don't need to move to travis, I [managed to free up 
23GB](https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd)
 of additional space on github actions, and we can free up more if required.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk edited a comment on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579


   I understand your point. First, I implemented the approach ` entirely 
little-endian`.
   
   Then, I reconsidered it. I thought that each primitive type should be 
represented in a little-endian as shown 
[here](https://github.com/apache/parquet-format/blob/master/Encodings.md) in 
**Parquet serialized form (e.g. file)**.  For example, int32 is kept using a 
native endian in a memory or register. It will be converted from/to a 
little-endian when it will be serialized. I extended this to Int96. This is a 
background of the current decision.
   
   Both approaches can work functionally correct. I understand that this is an 
implementation decision.
   
   It is ok with your approach. At that time, I have one question. How will we 
handle other primitive types (e.g. int32 or double)? Do we keep other 
primitives in entirely little-endian on both platforms (little-endian and 
big-endian CPUs)?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616456984


   I think data should be kept in native endianness in memory (that is what the 
user would expect). What we must be careful is that Parquet data is encoded 
(and decoded) as little endian. Your PR doesn't address that, AFAICT.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


kszucs commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616432127


   Build failures are unrelated. 
   
   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579


   I understand your point. First, I implemented the approach ` entirely 
little-endian`.
   
   Then, I reconsidered it. Each primitive type should be represented in a 
little-endian as shown 
[here](https://github.com/apache/parquet-format/blob/master/Encodings.md) in 
**Parquet serialized form (e.g. file)**.  For example, int32 is kept using a 
native endian in a memory or register. It will be converted from/to a 
little-endian when it will be serialized. I extended this to Int96. This is a 
background of the current decision.
   
   Both approaches can work functionally correct. I understand that this is an 
implementation decision.
   
   It is ok with your approach. At that time, I have one question. How will we 
handle other primitive types (e.g. int32 or double)? Do we keep other 
primitives in entirely little-endian on both platforms (little-endian and 
big-endian CPUs)?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606


   Thank you for your clarification. Based on this, `Int96` structure in memory 
can be represented in native endian. When it will be written into a file, we 
carefully have to keep it in a little-endian.
   
   I just realized it that int32 in [int64, int32] is still a little-endian. 
After one or two hours, I will update this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pprudhvi opened a new pull request #6990: ARROW-???? : fix gandiva macos build

2020-04-20 Thread GitBox


pprudhvi opened a new pull request #6990:
URL: https://github.com/apache/arrow/pull/6990


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


pprudhvi commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521137


   @ursabot crossbow submit -g gandiva



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


ursabot commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521423


   [AMD64 Conda Crossbow Submit 
(#101851)](https://ci.ursalabs.org/#builders/98/builds/640) builder has been 
succeeded.
   
   Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
ursabot-571](https://github.com/ursa-labs/crossbow/branches/all?query=ursabot-571)
   
   |Task|Status|
   ||--|
   
|gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-571-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-571-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616530174


   Can you add the explanation you gave above (about the memory layout) 
somewhere in `parquet/types.h`? Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox


wesm commented on issue #6954:
URL: https://github.com/apache/arrow/pull/6954#issuecomment-616533191


   I thought we had discussed removing the `ARROW_USE_SIMD` option altogether



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


wesm edited a comment on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement (is this captures in our benchmark executables?)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292


   At first, this PR address only test cases. This PR does not address the 
routines (like `column_writer.h`) yet. To update test cases can clarify the 
specification and implementation like this.
   
   In this change, int96, which is a pair of uint64 and uint32, is stored in 
memory as follows:
   ```
   For little-endian
   Int96 Int96_min = {{1024, 2048, 4096}};
   00 04 00 00 00 08 00 00 00 10 00 00
   ~~~ ~~~
   uint64 (LE) uint32 (LE)
   
   For big-endian,
   Int96 Int96_min = {{2048, 1024, 4096}};
   00 00 08 00 00 00 04 00 00 00 10 00
   ~~~ ~~~
   uint64 (BE) uint32 (BE)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou opened a new pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


pitrou opened a new pull request #6991:
URL: https://github.com/apache/arrow/pull/6991


   NextCounts() should be parametered with the dictionary index type, not the 
value type.
   Previous code seems to have succeeded by chance on little-endian platforms.
   
   See discussion in ARROW-8486.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6991:
URL: https://github.com/apache/arrow/pull/6991#issuecomment-616529709


   https://issues.apache.org/jira/browse/ARROW-8529



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox


liyafan82 commented on a change in pull request #6912:
URL: https://github.com/apache/arrow/pull/6912#discussion_r411355185



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java
##
@@ -320,6 +322,20 @@ public int hashCode(int index, ArrowBufHasher hasher) {
 return visitor.visit(this, value);
   }
 
+  @Override
+  public void validate() {
+if (getValueCount() < 0) {
+  throw new RuntimeException("vector valueCount is negative");
+}
+
+if (getNullCount() > getValueCount()) {

Review comment:
   For the Java implementation, since the null count is computed each time, 
we shoud never expect getNullCount() > getValueCount()?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616540298


   Sure, at first, I derived this layout from [this 
method](https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.h#L576).
 
   
   Other resources:
   https://github.com/apache/arrow/issues/1920#issuecomment-541764279
   
https://stackoverflow.com/questions/53103762/cast-int96-timestamp-from-parquet-to-golang/53104516#53104516



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6966: ARROW-8497: [Archery] Add missing components to build options

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6966:
URL: https://github.com/apache/arrow/pull/6966#discussion_r411365701



##
File path: dev/archery/archery/cli.py
##
@@ -134,36 +134,72 @@ def _apply_options(cmd, options):
   help="CMake's CMAKE_BUILD_TYPE")
 @click.option("--warn-level", default="production", type=warn_level_type,
   help="Controls compiler warnings -W(no-)error.")
-# components
+@click.option("--use-gold-linker", default=True, type=BOOL,
+  help="Toggles ARROW_USE_LD_GOLD option.")
+# Tests and benchmarks
 @click.option("--with-tests", default=True, type=BOOL,
   help="Build with tests.")
-@click.option("--with-benchmarks", default=False, type=BOOL,
+@click.option("--with-benchmarks", default=None, type=BOOL,
   help="Build with benchmarks.")
-@click.option("--with-examples", default=False, type=BOOL,
+@click.option("--with-examples", default=None, type=BOOL,
   help="Build with examples.")
-@click.option("--with-python", default=False, type=BOOL,
-  help="Build with python extension.")
-@click.option("--with-parquet", default=False, type=BOOL,
-  help="Build with parquet file support.")
-@click.option("--with-gandiva", default=False, type=BOOL,
+@click.option("--with-integration", default=None, type=BOOL,
+  help="Build with integration test executables.")
+# Static checks
+@click.option("--use-asan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_ASAN sanitizer.")
+@click.option("--use-tsan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_TSAN sanitizer.")
+@click.option("--use-ubsan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_UBSAN sanitizer.")
+@click.option("--with-fuzzing", default=None, type=BOOL,
+  help="Toggles ARROW_FUZZING.")
+# Components
+@click.option("--with-compute", default=None, type=BOOL,
+  help="Build the arrow compute module.")
+@click.option("--with-csv", default=None, type=BOOL,
+  help="Build the arrow csv parser module.")
+@click.option("--with-cuda", default=None, type=BOOL,
+  help="Build the arrow cuda parser module.")
+@click.option("--with-dataset", default=None, type=BOOL,
+  help="Build the arrow dataset module.")
+@click.option("--with-filesystem", default=None, type=BOOL,
+  help="Build the arrow FileSystem layer.")

Review comment:
   Nit: no need to capitalize "filesystem".

##
File path: dev/archery/archery/cli.py
##
@@ -134,36 +134,72 @@ def _apply_options(cmd, options):
   help="CMake's CMAKE_BUILD_TYPE")
 @click.option("--warn-level", default="production", type=warn_level_type,
   help="Controls compiler warnings -W(no-)error.")
-# components
+@click.option("--use-gold-linker", default=True, type=BOOL,
+  help="Toggles ARROW_USE_LD_GOLD option.")
+# Tests and benchmarks
 @click.option("--with-tests", default=True, type=BOOL,
   help="Build with tests.")
-@click.option("--with-benchmarks", default=False, type=BOOL,
+@click.option("--with-benchmarks", default=None, type=BOOL,
   help="Build with benchmarks.")
-@click.option("--with-examples", default=False, type=BOOL,
+@click.option("--with-examples", default=None, type=BOOL,
   help="Build with examples.")
-@click.option("--with-python", default=False, type=BOOL,
-  help="Build with python extension.")
-@click.option("--with-parquet", default=False, type=BOOL,
-  help="Build with parquet file support.")
-@click.option("--with-gandiva", default=False, type=BOOL,
+@click.option("--with-integration", default=None, type=BOOL,
+  help="Build with integration test executables.")
+# Static checks
+@click.option("--use-asan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_ASAN sanitizer.")
+@click.option("--use-tsan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_TSAN sanitizer.")
+@click.option("--use-ubsan", default=None, type=BOOL,
+  help="Toggles ARROW_USE_UBSAN sanitizer.")
+@click.option("--with-fuzzing", default=None, type=BOOL,
+  help="Toggles ARROW_FUZZING.")
+# Components
+@click.option("--with-compute", default=None, type=BOOL,
+  help="Build the arrow compute module.")
+@click.option("--with-csv", default=None, type=BOOL,
+  help="Build the arrow csv parser module.")

Review comment:
   "CSV"?

##
File path: dev/archery/archery/cli.py
##
@@ -134,36 +134,72 @@ def _apply_options(cmd, options):
   help="CMake's CMAKE_BUILD_TYPE")
 @click.option("--warn-level", default="production", type=warn_level_type,
   help="Controls compiler warnings -W(no-)error.")
-# components
+@click.option("--use-gold-linker", default=True, type=BOOL,
+  help="Toggles ARROW_USE_LD_GOLD option.")
+# Tests 

[GitHub] [arrow] github-actions[bot] commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616522599


   https://issues.apache.org/jira/browse/ARROW-8528



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk edited a comment on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292


   At first, this PR addresses only test cases. This PR does not address the 
routines (like `column_writer.h`) yet. To update test cases can clarify the 
specification and implementation like this.
   
   In this change, int96, which is a pair of uint64 and uint32, is stored in 
memory as follows:
   ```
   For little-endian
   Int96 Int96_min = {{1024, 2048, 4096}};
   00 04 00 00 00 08 00 00 00 10 00 00
   ~~~ ~~~
   uint64 (LE) uint32 (LE)
   
   For big-endian,
   Int96 Int96_min = {{2048, 1024, 4096}};
   00 00 08 00 00 00 04 00 00 00 10 00
   ~~~ ~~~
   uint64 (BE) uint32 (BE)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality

2020-04-20 Thread GitBox


liyafan82 commented on a change in pull request #6912:
URL: https://github.com/apache/arrow/pull/6912#discussion_r411352198



##
File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java
##
@@ -283,4 +283,10 @@
* @return the name of the vector.
*/
   String getName();
+
+  /**
+   * Validate the vector, will throw exception if validate fail.
+   */
+  void validate();

Review comment:
   I agree with you that it makes the client code simpler. 
   
   IMO, the real benefits are 1) one fewer virtual function call; and 2) one 
fewer stack frame. However, the cost is that it makes the interface larger and 
difficult to maintain. In addition, adding new APIs to a public interface has 
the risk of crashing client code. 
   
   So I think the method of adding new APIs should be reserved for 
performance-critical operations. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche opened a new pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox


jorisvandenbossche opened a new pull request #6992:
URL: https://github.com/apache/arrow/pull/6992


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox


jorisvandenbossche commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-616595450


   This is still WIP (depending on which pandas version we choose, we can clean 
up some things in the pandas-shim.pxi), but:
   
   - I defined the minimal pandas version now as pandas 0.23 (released 2 years 
ago). But, I also ran tests with pandas 0.22 and pandas 0.21, and tests are 
still passing for those. So we *could* also set this minimal version at 0.21 
(the mailing list thread that triggered this issue reported an error on pandas 
0.20). But personally, I think 0.23 is more than fine.
   - I now raise an error when the pandas version is too low. But, since it is 
actually working (mostly) for lower versions as well, we could also only give a 
warning instead. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6992:
URL: https://github.com/apache/arrow/pull/6992#issuecomment-616601667


   https://issues.apache.org/jira/browse/ARROW-7950



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox


wesm commented on issue #6954:
URL: https://github.com/apache/arrow/pull/6954#issuecomment-616604745


   Maybe it was just a thought I had in my head but never expressed. Opened 
https://issues.apache.org/jira/browse/ARROW-8531



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411416006



##
File path: .github/workflows/cpp.yml
##
@@ -228,7 +228,9 @@ jobs:
 run: ci/scripts/util_checkout.sh
   - name: Build
 shell: bash
-run: ci/scripts/cpp_build.sh $(pwd) $(pwd)/build
+run: |
+  export BOOST_ROOT=$BOOST_ROOT_1_72_0
+  ci/scripts/cpp_build.sh $(pwd) $(pwd)/build

Review comment:
   This is an unrelated CI fix.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-20 Thread GitBox


cyb70289 commented on issue #6954:
URL: https://github.com/apache/arrow/pull/6954#issuecomment-616585651


   > I thought we had discussed removing the `ARROW_USE_SIMD` option altogether
   
   @wesm remove `ARROW_USE_SIMD`? I remember a thread about default simd level( 
https://github.com/apache/arrow/pull/6907#issuecomment-612896453), but not 
`ARROW_USE_SIMD`.
   That said, I would like to remove it as it overlaps other setting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org