[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359


   I've started a discussion on the 
[mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make 
other people aware of your efforts.
   
   I wonder if creating a `LETypedBufferBuilder` would make more sense than 
adding `AppendLE` methods (I don't think it makes sense to use two different 
endiannesses in a single buffer). It should probably be discussed on the ML.
   
   As for `Serialize`, I can't really tell you. Parquet encoding routines seem 
a bit all over the place (most of it happens in `encoding.cc`). Perhaps other 
developers can chime in...
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616667220


   https://issues.apache.org/jira/browse/ARROW-8477



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616759329


   Thank you for staring the discussion. I will watch at the thread.
   
   Yeah, `LETypedBufferBuilder` makes sense. It looks better than adding 
`AppendLE`.
   
   Regarding `Serialize`, it looks a good place where the class has both types 
of an element for Arrow and Parquet. But, encoding (i.e RLE and others) happens 
in `encoding.cc`. Let me check it tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


vertexclique commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411501620



##
File path: rust/arrow/src/array/builder.rs
##
@@ -236,6 +251,14 @@ impl BufferBuilderTrait for 
BufferBuilder {
 self.write_bytes(v.to_byte_slice(), 1)
 }
 
+default fn append_n( self, n: usize, v: T::Native) -> Result<()> {
+self.reserve(n)?;
+for _ in 0..n {
+self.write_bytes(v.to_byte_slice(), 1)?;
+}

Review comment:
   Any preference to not to use an iterator here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


kiszk commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411585778



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   For Parquet use case, the max width of the index is 32 based on 
https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142


   Your changes look good.  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-20 Thread GitBox


BryanCutler commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r411513297



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java
##
@@ -34,31 +33,24 @@
   static final UnsafeDirectLittleEndian EMPTY = INNER_ALLOCATOR.empty;
   static final long CHUNK_SIZE = INNER_ALLOCATOR.getChunkSize();
 
-  private final int allocatedSize;
-  private final UnsafeDirectLittleEndian memoryChunk;
+  private final long allocatedSize;
 
-  NettyAllocationManager(BaseAllocator accountingAllocator, int requestedSize) 
{
-super(accountingAllocator);
-this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize);

Review comment:
   I don't think we should remove this which effectively replaces all 
allocations done in Arrow Java, which is a big change. `INNER_ALLOCATOR` also 
uses a pool which has some benefits. Instead, can you just change 
`requestedSize` to be a long, then check if it is over the max Int size and 
only then use `PlatformDependent.allocateMemory`?

##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.

Review comment:
   please update

##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.
+ *   This can be achieved by the environmental variable or system property.
+ *   The details can be found in {@link DefaultAllocationManagerOption}.
+ * 
+ */
+public class TestLargeArrowBuf {
+
+  private static void testLargeArrowBuf() {
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ ArrowBuf largeBuf = allocator.buffer(bufSize)) {
+  assertEquals(bufSize, largeBuf.capacity());
+  System.out.println("Successfully allocated a buffer with capacity " + 
largeBuf.capacity());
+
+  for (long i = 0; i < bufSize / 8; i++) {
+largeBuf.setLong(i * 8, i);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " long 
words");
+}
+  }
+  System.out.println("Successfully written " + (bufSize / 8) + " long 
words");
+
+  for (long i = 0; i < bufSize / 8; i++) {
+long val = largeBuf.getLong(i * 8);
+assertEquals(i, val);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully read " + (i + 1) + " long words");
+}
+  }
+  System.out.println("Successfully read " + (bufSize / 8) + " long words");
+}
+

[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616703869


   Looks like Windows long paths are enabled by default on Github Actions. Cool!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616748578


   The remaining CI failure is unrelated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411569025



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   I experienced failures in the CI pipeline on macOS and I was unable to 
locate clear documentation of the path name limits (I'm not a macOS expert). 
   
   I figured it might be best to separately address macOS in another issue, if 
there's community support for that.  (This patch leaves macOS builds and tests 
unchanged.)
   
   If you believe this test should run as is under macOS, I'll remove the 
#ifndef and follow up on any issues.
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau opened a new pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau opened a new pull request #6993:
URL: https://github.com/apache/arrow/pull/6993


   This patch enables reading/writing of files with long (>260 characters) 
pathnames in Windows.
   
   In order for the new test to run under Windows, both (1) the test host must 
have long paths enabled in its registry, and (2) the test executable 
(arrow_utility_test.exe) must include a manifest indicating support for long 
paths (see 
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#enable-long-paths-in-windows-10-version-1607-and-later).
  The test source code checks for (1) and the cmake file changes ensure (2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411547763



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   Why did you have to disable this test on macOS?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6993:
URL: https://github.com/apache/arrow/pull/6993#discussion_r411570280



##
File path: cpp/src/arrow/util/io_util_test.cc
##
@@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) {
   ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF"));
   ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn));
   ASSERT_TRUE(created);
+
+#ifndef __APPLE__

Review comment:
   I was just wondering. According to Google searches, the path length 
limit on macOS may be 1024, which this test exceeds. We can keep it like that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau edited a comment on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau edited a comment on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142


   Your fixups look good.  Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


emkornfield commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411577540



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   might be worth a comment here and below why IndexType is always static.  
Would it be possible to add a unit test with a larger value that would have 
failed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


ursabot commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616716254


   [AMD64 Conda Crossbow Submit 
(#101910)](https://ci.ursalabs.org/#builders/98/builds/641) builder has been 
succeeded.
   
   Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
ursabot-572](https://github.com/ursa-labs/crossbow/branches/all?query=ursabot-572)
   
   |Task|Status|
   ||--|
   
|gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-20 Thread GitBox


pprudhvi commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-616715997


   @ursabot crossbow submit -g gandiva



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows

2020-04-20 Thread GitBox


tpboudreau commented on issue #6993:
URL: https://github.com/apache/arrow/pull/6993#issuecomment-616759574


   Thanks @pitrou for jumping on this so quickly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


pitrou commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411578282



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   You mean a larger type, or an index larger than 2**32?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411675427



##
File path: rust/arrow/src/array/builder.rs
##
@@ -236,6 +251,14 @@ impl BufferBuilderTrait for 
BufferBuilder {
 self.write_bytes(v.to_byte_slice(), 1)
 }
 
+default fn append_n( self, n: usize, v: T::Native) -> Result<()> {
+self.reserve(n)?;
+for _ in 0..n {
+self.write_bytes(v.to_byte_slice(), 1)?;
+}

Review comment:
   I'm not sure I understand what you mean?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


bkietz commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616819386


   @github-actions crossbow submit -g nightly



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz opened a new pull request #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


bkietz opened a new pull request #6994:
URL: https://github.com/apache/arrow/pull/6994


   Add a `status.json` to the gh-pages summary of nightly builds to get around 
rate limiting



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887183


   @github-actions crossbow submit test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820980


   https://issues.apache.org/jira/browse/ARROW-8043



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


kou commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616819824


   Wow! Awesome!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6994:
URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820253


   Revision: 89cf7325ab761a35b0c8a0da7096805984e18435
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-156](https://github.com/ursa-labs/crossbow/branches/all?query=actions-156)
   
   |Task|Status|
   ||--|
   |centos-6-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-6-amd64)|
   |centos-7-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-7-amd64)|
   |centos-8-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-8-amd64)|
   
|conda-linux-gcc-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py36)|
   
|conda-linux-gcc-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py37)|
   
|conda-linux-gcc-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py38)|
   
|conda-osx-clang-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py36)|
   
|conda-osx-clang-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py37)|
   
|conda-osx-clang-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py38)|
   
|conda-win-vs2015-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py36)|
   
|conda-win-vs2015-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py37)|
   
|conda-win-vs2015-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py38)|
   |debian-buster-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-buster-amd64)|
   |debian-stretch-amd64|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-stretch-amd64)|
   
|gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   
|homebrew-r-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-r-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|
   |test-conda-cpp|[![Github 

[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-20 Thread GitBox


kou commented on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616869503


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881056


   @github-actions crossbow submit test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881426


   Revision: 1ed83aaf5dd17d4e3b31aa1cc657f1220da2c8d4
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-157](https://github.com/ursa-labs/crossbow/branches/all?query=actions-157)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-157-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-157-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


nealrichardson opened a new pull request #6995:
URL: https://github.com/apache/arrow/pull/6995


   Having some trouble/slowness with r-hub for testing so made this PR to use 
crossbow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887542


   Revision: 88c0198d775796d5a39644a22840a45470b4253f
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-158](https://github.com/ursa-labs/crossbow/branches/all?query=actions-158)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-158-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-158-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669


   Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927551


   Revision: 69081241244da5decee0bf0ea3cb2f24059d244d
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-159](https://github.com/ursa-labs/crossbow/branches/all?query=actions-159)
   
   |Task|Status|
   ||--|
   
|homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-159-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


nealrichardson commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927211


   @github-actions crossbow submit homebrew-cpp



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


nealrichardson opened a new pull request #6996:
URL: https://github.com/apache/arrow/pull/6996


   One more I didn't remove in ARROW-8222.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-616931067


   https://issues.apache.org/jira/browse/ARROW-8538



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-20 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079


   @pitrou @wesm 
   Oops, I only checked case "BitmapReader" from benchmark 
[arrow-bit-util-benchmark](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_benchmark.cc).
 Obviously it's not enough.
   
   I compared all cases just now and see huge performance drop from below 4 
tests:
   
   Before this patch:
   ```bash
   BenchmarkBitmapAnd/32768/1 563496 ns   563260 ns 
1243 bytes_per_second=55.4806M/s
   BenchmarkBitmapAnd/131072/1   2219810 ns  2218984 ns 
 318 bytes_per_second=56.3321M/s
   BenchmarkBitmapAnd/32768/2 561738 ns   561467 ns 
1265 bytes_per_second=55.6577M/s
   BenchmarkBitmapAnd/131072/2   2246229 ns  2245119 ns 
 305 bytes_per_second=55.6763M/s
   ```
   
   After this patch:
   ```bash
   BenchmarkBitmapAnd/32768/11653467 ns  1652680 ns 
 422 bytes_per_second=18.9087M/s
   BenchmarkBitmapAnd/131072/1   6665501 ns  6661561 ns 
 105 bytes_per_second=18.7644M/s
   BenchmarkBitmapAnd/32768/21670793 ns  1670246 ns 
 423 bytes_per_second=18.7098M/s
   BenchmarkBitmapAnd/131072/2   6702369 ns  6698957 ns 
 103 bytes_per_second=18.6596M/s
   ```
   
   Before reverting this patch, I would like to understand why it happens.
   
   BTW: we definitely need continuous benchmark tools to detect these things 
early.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252


   This change introduces severe branch misses in certain conditions. See perf 
logs below. I changed benchmark code to run only the problematic test case.
   
   Without this patch
   ```bash
   807.415826  task-clock (msec) #0.979 CPUs utilized   
   
   83  context-switches  #0.103 K/sec   
   
0  cpu-migrations#0.000 K/sec   
   
  427  page-faults   #0.529 K/sec   
   
2,285,801,407  cycles#2.831 GHz 
 (83.17%)
2,313,785  stalled-cycles-frontend   #0.10% frontend cycles 
idle (83.16%)
  915,631,177  stalled-cycles-backend#   40.06% backend cycles 
idle  (82.93%)
9,997,208,858  instructions  #4.37  insn per cycle  
   
 #0.09  stalled cycles 
per insn  (83.66%)
1,679,799,451  branches  # 2080.464 M/sec   
 (83.66%)
  106,599  branch-misses #0.01% of all branches 
 (83.41%)
   ```
   
   With this patch
   ```bash
   902.557236  task-clock (msec) #0.980 CPUs utilized   
   
   94  context-switches  #0.104 K/sec   
   
0  cpu-migrations#0.000 K/sec   
   
  427  page-faults   #0.473 K/sec   
   
2,567,879,767  cycles#2.845 GHz 
 (83.17%)
   88,266,680  stalled-cycles-frontend   #3.44% frontend cycles 
idle (83.17%)
   20,826,862  stalled-cycles-backend#0.81% backend cycles 
idle  (83.03%)
2,518,949,193  instructions  #0.98  insn per cycle  
   
 #0.04  stalled cycles 
per insn  (83.62%)
  847,459,928  branches  #  938.954 M/sec   
 (83.61%)
   75,187,208  branch-misses #8.87% of all branches 
 (83.39%)
   ```
   Absolute counts are not comparable as gtest runs different loops for each 
test.
   The point is branch-misses jumps from 0.01% to 8.87%, which causes high 
frontend stall(cpu wait for fetching code to execute), and ipc(instructions per 
cycle) drops from 4.37 to 0.98.
   
   I didn't figure out which branch is miss predicted and why. My haswell cpu 
is too old to support branch tracing. My guess is [this 
line](https://github.com/apache/arrow/blob/5093b809d63ac8db99aec9caa7ad7e723f277c46/cpp/src/arrow/util/bit_util.cc#L285),
 no concrete justification.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-20 Thread GitBox


emkornfield commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r411882849



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   I think I misunderstood the issue for the unit testing.  I guess this 
would have been caught if we had a big endian machine?  
   
   I think adding the comment with the link that @kiszk provided would still 
make sense for the casual reader?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data

2020-04-21 Thread GitBox


pitrou commented on a change in pull request #6991:
URL: https://github.com/apache/arrow/pull/6991#discussion_r412025224



##
File path: cpp/src/arrow/util/rle_encoding.h
##
@@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t 
dictionary_length) {
 template 
 inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t 
dictionary_length,
 T* values, int batch_size) {
+  using IndexType = int32_t;

Review comment:
   Note that this was previously simply `int`. The change here is simply to 
make things clearer and also note the index size explicitly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


pitrou commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617033346


   To be honest, `BitmapAnd` should probably be rewritten using 
`Bitmap::VisitWords`.
   But we can revert anyway if we fear regressions may appear in other 
workloads.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox


pitrou commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-617061541


   "The job exceeded the maximum log length, and has been terminated." -- 
restarting



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox


pitrou commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-617067706


   Wow, that is compiling OpenSSL by hand?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

2020-04-21 Thread GitBox


wesm commented on issue #6744:
URL: https://github.com/apache/arrow/pull/6744#issuecomment-617169894


   Taking a look at this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox


liyafan82 commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r412067781



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java
##
@@ -34,31 +33,24 @@
   static final UnsafeDirectLittleEndian EMPTY = INNER_ALLOCATOR.empty;
   static final long CHUNK_SIZE = INNER_ALLOCATOR.getChunkSize();
 
-  private final int allocatedSize;
-  private final UnsafeDirectLittleEndian memoryChunk;
+  private final long allocatedSize;
 
-  NettyAllocationManager(BaseAllocator accountingAllocator, int requestedSize) 
{
-super(accountingAllocator);
-this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize);

Review comment:
   Revised. Thank you for the good suggestion. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox


liyafan82 commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r412067968



##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.

Review comment:
   Nice catch. Thank you. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build

2020-04-21 Thread GitBox


pprudhvi commented on issue #6990:
URL: https://github.com/apache/arrow/pull/6990#issuecomment-617111538


   lets wait till https://github.com/Homebrew/homebrew-core/pull/53445/files is 
merged. see https://issues.apache.org/jira/browse/ARROW-8539



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox


kszucs opened a new pull request #6999:
URL: https://github.com/apache/arrow/pull/6999


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox


jorisvandenbossche commented on a change in pull request #7000:
URL: https://github.com/apache/arrow/pull/7000#discussion_r412150862



##
File path: cpp/src/arrow/dataset/dataset.h
##
@@ -30,12 +30,22 @@
 namespace arrow {
 namespace dataset {
 
-/// \brief A granular piece of a Dataset, such as an individual file, which 
can be
-/// read/scanned separately from other fragments.
+/// \brief A granular piece of a Dataset, such as an individual file.
 ///
-/// A Fragment yields a collection of RecordBatch, encapsulated in one or more 
ScanTasks.
+/// A Fragment can be read/scanned separately from other fragments. It yields a
+/// collection of RecordBatch, encapsulated in one or more ScanTasks.
+///
+/// A notable difference from Dataset is that Fragments have physical schemas
+/// which may differ from Fragments.

Review comment:
   ```suggestion
   /// which may differ from other Fragments.
   ```

##
File path: python/pyarrow/_dataset.pyx
##
@@ -519,30 +500,69 @@ cdef class Fragment:
 """
 return Expression.wrap(self.fragment.partition_expression())
 
-def to_table(self, use_threads=True, MemoryPool memory_pool=None):
-"""Convert this Fragment into a Table.
+def _scanner(self, **kwargs):
+return Scanner.from_fragment(self, **kwargs)
 
-Use this convenience utility with care. This will serially materialize
-the Scan result in memory before creating the Table.
+def scan(self, columns=None, filter=None, use_threads=True,
+ MemoryPool memory_pool=None, **kwargs):
+"""Builds a scan operation against the dataset.
+

Review comment:
   When using Fragment.scan, it uses the Fragment's physical schema for the 
resulting table? (since the Fragment is not aware of the dataset "read" 
schema?) 
   If so, we should note that here in the docstring I think

##
File path: cpp/src/arrow/dataset/file_parquet.cc
##
@@ -433,26 +430,22 @@ Result 
ParquetFileFormat::GetRowGroupFragments(
   }
   FragmentVector fragments(row_groups.size());
 
-  auto new_options = std::make_shared(*fragment.scan_options());
-  if (!extra_filter->Equals(true)) {
-new_options->filter = and_(std::move(extra_filter), 
std::move(new_options->filter));
-  }
-
-  RowGroupSkipper skipper(std::move(metadata), std::move(arrow_properties),
-  new_options->filter, std::move(row_groups));
+  RowGroupSkipper skipper(std::move(metadata), std::move(arrow_properties), 
extra_filter,

Review comment:
   We should probably rename "extra_filter" to just "filter" or "predicate" 
as how it is called in Dataset::GetFragments, since it is no longer "extra" ?

##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -671,41 +669,29 @@ def test_fragments(tempdir):
 f = fragments[0]
 
 # file's schema does not include partition column
-phys_schema = f.schema.remove(f.schema.get_field_index('part'))
-assert f.format.inspect(f.path, f.filesystem) == phys_schema
+assert f.physical_schema.names == ['f1', 'f2']
+assert f.format.inspect(f.path, f.filesystem) == f.physical_schema
 assert f.partition_expression.equals(ds.field('part') == 'a')
 
 # scanning fragment includes partition columns
-result = f.to_table()
-assert f.schema == result.schema
+result = f.to_table(schema=dataset.schema)

Review comment:
   Can you also test without passing the dataset's schema, and assert that 
the column_names are [f1, f2] ?

##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -671,41 +669,29 @@ def test_fragments(tempdir):
 f = fragments[0]
 
 # file's schema does not include partition column
-phys_schema = f.schema.remove(f.schema.get_field_index('part'))
-assert f.format.inspect(f.path, f.filesystem) == phys_schema
+assert f.physical_schema.names == ['f1', 'f2']
+assert f.format.inspect(f.path, f.filesystem) == f.physical_schema
 assert f.partition_expression.equals(ds.field('part') == 'a')
 
 # scanning fragment includes partition columns
-result = f.to_table()
-assert f.schema == result.schema
+result = f.to_table(schema=dataset.schema)
 assert result.column_names == ['f1', 'f2', 'part']
-assert len(result) == 4
 assert result.equals(table.slice(0, 4))
-
-# scanning fragments follow column projection
-fragments = list(dataset.get_fragments(columns=['f1', 'part']))

Review comment:
   Keep this but where the columns selection is passe to `to_table` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-21 Thread GitBox


wesm edited a comment on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-617164152


   The copy-pasta in the .yml files is a bummer. I hope one day for a higher 
level specification of these tasks (thank you for fixing the disk usage issue, 
though!)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-21 Thread GitBox


wesm commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-617164152


   The copy-pasta in the .yml files is a bummer. I hope one day for a higher 
level specification of these tasks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] gramirezespinoza commented on issue #6977: Missing `take` method in pyarrow's `Table` class

2020-04-21 Thread GitBox


gramirezespinoza commented on issue #6977:
URL: https://github.com/apache/arrow/issues/6977#issuecomment-617178347


   Waiting for #6970 to be approved/merged



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox


bkietz commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r412194818



##
File path: dev/archery/archery/integration/datagen.py
##
@@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():
   dictionaries=[dict0, dict1, dict2])
 
 
+def generate_extension_case():
+uuid_type = ExtensionType('uuid', 'uuid-serialization',
+  FixedSizeBinaryField('', 16))
+
+fields = [
+ExtensionField('uuids', uuid_type),

Review comment:
   Should we also test dictionary(ext) and ext(storage=dictionary)?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox


pitrou commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r412227300



##
File path: dev/archery/archery/integration/datagen.py
##
@@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():
   dictionaries=[dict0, dict1, dict2])
 
 
+def generate_extension_case():
+uuid_type = ExtensionType('uuid', 'uuid-serialization',
+  FixedSizeBinaryField('', 16))
+
+fields = [
+ExtensionField('uuids', uuid_type),

Review comment:
   dictionary(ext) is not possible as per 
https://mail-archives.apache.org/mod_mbox/arrow-dev/202004.mbox/%3CCAJPUwMAvxLYxJg_QRgdALSq3XS%2BY0zV_LYwsd6FVNYbA90RAAw%40mail.gmail.com%3E
 , but I'll try to add ext(dictionary).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617087421


   I think that the current test cases for parquet writer do not have tests to 
verify the bit pattern of the generated parquet file. I will also create the 
test case in another PR since they are important on the different native endian 
platforms.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6997: ARROW-8540: [C++] Add memory allocation benchmarks

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #6997:
URL: https://github.com/apache/arrow/pull/6997#issuecomment-617116743


   https://issues.apache.org/jira/browse/ARROW-8540



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-21 Thread GitBox


kszucs opened a new pull request #6998:
URL: https://github.com/apache/arrow/pull/6998


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617126711


   Perhaps. If the reader is compatible with those files, and roundtripping 
works, then the writer is probably compliant as well.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] fsaintjacques opened a new pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox


fsaintjacques opened a new pull request #7000:
URL: https://github.com/apache/arrow/pull/7000


   This is the first part of a refactor to make Fragment accessible without a 
Scan operation instance. This is a breaking change. It introduces the concept 
of a physical schema and read schema, these concepts are analogous to Avro 
writer and reader schema.
   
   - Move ScanOptions at Fragment::Scan instead of a property
   - Refactor Dataset::GetFragments(ScanOptions) to 
Dataset::GetFragments(Expression)
   - Add Fragment::ReadPhysicalSchema



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox


liyafan82 commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r412069840



##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.memory;
+
+import static org.junit.Assert.assertEquals;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.
+ *   This can be achieved by the environmental variable or system property.
+ *   The details can be found in {@link DefaultAllocationManagerOption}.
+ * 
+ */
+public class TestLargeArrowBuf {
+
+  private static void testLargeArrowBuf() {
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ ArrowBuf largeBuf = allocator.buffer(bufSize)) {
+  assertEquals(bufSize, largeBuf.capacity());
+  System.out.println("Successfully allocated a buffer with capacity " + 
largeBuf.capacity());
+
+  for (long i = 0; i < bufSize / 8; i++) {
+largeBuf.setLong(i * 8, i);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " long 
words");
+}
+  }
+  System.out.println("Successfully written " + (bufSize / 8) + " long 
words");
+
+  for (long i = 0; i < bufSize / 8; i++) {
+long val = largeBuf.getLong(i * 8);
+assertEquals(i, val);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully read " + (i + 1) + " long words");
+}
+  }
+  System.out.println("Successfully read " + (bufSize / 8) + " long words");
+}
+System.out.println("Successfully released the large buffer.");
+  }
+
+  public static void main(String[] args) {

Review comment:
   Sounds good to me. 
   The problem is that we set arrow.vector.max_allocation_bytes to 1048576 for 
every test case (to avoid OOM).  Please see the pom.xml file. 
   
   So if we convert it to a test case, we cannot allocate too much memory. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox


liyafan82 commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r412070083



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for a vector with a large (more than 2GB) {@link 
io.netty.buffer.ArrowBuf} as
+ * the data buffer.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.

Review comment:
   Revised. Thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112291


   @kiszk The preferred way to do that would be to add a file to the 
https://github.com/apache/parquet-testing repository. It's checked in as a 
submodule in `cpp/submodules` and used in the Parquet test suite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112444


   (also look for the "PARQUET_TEST_DATA" environment variable)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617150324


   In the meantime, when we have microperformance patches like these it would 
be a good practice in the future to make sure that performance results are 
reproduced in the codebase's benchmark executables rather than on an ad hoc 
basis



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6999: ARROW-8542: [Release] Fix checksum url in the website post release script

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #6999:
URL: https://github.com/apache/arrow/pull/6999#issuecomment-617150142


   https://issues.apache.org/jira/browse/ARROW-8542



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations

2020-04-21 Thread GitBox


liyafan82 commented on a change in pull request #6323:
URL: https://github.com/apache/arrow/pull/6323#discussion_r412070422



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/TestLargeVector.java
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.memory.RootAllocator;
+
+import io.netty.buffer.ArrowBuf;
+
+/**
+ * Integration test for a vector with a large (more than 2GB) {@link 
io.netty.buffer.ArrowBuf} as
+ * the data buffer.
+ * To run this test, please
+ *Make sure there are 4GB memory available in the system.
+ * 
+ *   Make sure the default allocation manager type is unsafe.
+ *   This can be achieved by the environmental variable or system property.
+ *   The details can be found in {@link DefaultAllocationManagerOption}.
+ * 
+ */
+public class TestLargeVector {
+  private static void testLargeLongVector() {
+System.out.println("Testing large big int vector.");
+
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+final int vecLength = (int) (bufSize / BigIntVector.TYPE_WIDTH);
+
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+BigIntVector largeVec = new BigIntVector("vec", allocator)) {
+  largeVec.allocateNew(vecLength);
+
+  System.out.println("Successfully allocated a vector with capacity " + 
vecLength);
+
+  for (int i = 0; i < vecLength; i++) {
+largeVec.set(i, i * 10L);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " values");
+}
+  }
+  System.out.println("Successfully written " + vecLength + " values");
+
+  for (int i = 0; i < vecLength; i++) {
+long val = largeVec.get(i);
+assertEquals(i * 10L, val);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully read " + (i + 1) + " values");
+}
+  }
+  System.out.println("Successfully read " + vecLength + " values");
+}
+System.out.println("Successfully released the large vector.");
+  }
+
+  private static void testLargeIntVector() {
+System.out.println("Testing large int vector.");
+
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+final int vecLength = (int) (bufSize / IntVector.TYPE_WIDTH);
+
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ IntVector largeVec = new IntVector("vec", allocator)) {
+  largeVec.allocateNew(vecLength);
+
+  System.out.println("Successfully allocated a vector with capacity " + 
vecLength);
+
+  for (int i = 0; i < vecLength; i++) {
+largeVec.set(i, i);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " values");
+}
+  }
+  System.out.println("Successfully written " + vecLength + " values");
+
+  for (int i = 0; i < vecLength; i++) {
+long val = largeVec.get(i);
+assertEquals(i, val);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully read " + (i + 1) + " values");
+}
+  }
+  System.out.println("Successfully read " + vecLength + " values");
+}
+System.out.println("Successfully released the large vector.");
+  }
+
+  private static void testLargeDecimalVector() {
+System.out.println("Testing large decimal vector.");
+
+final long bufSize = 4 * 1024 * 1024 * 1024L;
+final int vecLength = (int) (bufSize / DecimalVector.TYPE_WIDTH);
+
+try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
+ DecimalVector largeVec = new DecimalVector("vec", allocator, 38, 16)) 
{
+  largeVec.allocateNew(vecLength);
+
+  System.out.println("Successfully allocated a vector with capacity " + 
vecLength);
+
+  for (int i = 0; i < vecLength; i++) {
+largeVec.set(i, 0);
+
+if ((i + 1) % 1 == 0) {
+  System.out.println("Successfully written " + (i + 1) + " values");
+}
+  }
+  System.out.println("Successfully written " + 

[GitHub] [arrow] pitrou opened a new pull request #6997: ARROW-8540: [C++] Add memory allocation benchmarks

2020-04-21 Thread GitBox


pitrou opened a new pull request #6997:
URL: https://github.com/apache/arrow/pull/6997


   Example output:
   ```
   
---
   Benchmark Time   
  CPU   Iterations
   
---
   TouchArea/size:4096/real_time  20.1 ns   
  20.1 ns 34893671
   TouchArea/size:65536/real_time  483 ns   
   483 ns  1448647
   TouchArea/size:1048576/real_time   7670 ns   
  7669 ns90816
   TouchArea/size:16777216/real_time124297 ns   
124280 ns 5611
   
   AllocateDeallocate/size:4096/real_time18.6 ns   
  18.6 ns 37781939
   AllocateDeallocate/size:65536/real_time161 ns   
   161 ns  4360765
   AllocateDeallocate/size:1048576/real_time  328 ns   
   328 ns  2131288
   AllocateDeallocate/size:16777216/real_time 160 ns   
   160 ns  4366862
   AllocateTouchDeallocate/size:4096/real_time   40.4 ns   
  40.4 ns 17333165
   AllocateTouchDeallocate/size:65536/real_time   640 ns   
   640 ns  1092988
   AllocateTouchDeallocate/size:1048576/real_time7959 ns   
  7958 ns87693
   AllocateTouchDeallocate/size:16777216/real_time 124816 ns   
124801 ns 5602
   
   AllocateDeallocate/size:4096/real_time   22.2 ns   
  22.2 ns 31611774
   AllocateDeallocate/size:65536/real_time   157 ns   
   157 ns  4460745
   AllocateDeallocate/size:1048576/real_time 330 ns   
   330 ns  2113808
   AllocateDeallocate/size:16777216/real_time158 ns   
   158 ns  4439623
   AllocateTouchDeallocate/size:4096/real_time  43.0 ns   
  43.0 ns 16252256
   AllocateTouchDeallocate/size:65536/real_time  638 ns   
   638 ns  1091897
   AllocateTouchDeallocate/size:1048576/real_time   7961 ns   
  7960 ns87755
   AllocateTouchDeallocate/size:16777216/real_time124699 ns   
124682 ns 5588
   
   AllocateDeallocate/size:4096/real_time232 ns   
   232 ns  3015215
   AllocateDeallocate/size:65536/real_time   153 ns   
   153 ns  4527945
   AllocateDeallocate/size:1048576/real_time 146 ns   
   146 ns  4720662
   AllocateDeallocate/size:16777216/real_time144 ns   
   144 ns  4859165
   AllocateTouchDeallocate/size:4096/real_time   254 ns   
   254 ns  2750031
   AllocateTouchDeallocate/size:65536/real_time  635 ns   
   635 ns  1100267
   AllocateTouchDeallocate/size:1048576/real_time   7753 ns   
  7752 ns89887
   AllocateTouchDeallocate/size:16777216/real_time124518 ns   
124501 ns 5604
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617113951


   Thank you for your suggestion. I think that these files are used only for 
read test now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6998: ARROW-8541: [Release] Don't remove previous source releases automatically

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #6998:
URL: https://github.com/apache/arrow/pull/6998#issuecomment-617130185


   https://issues.apache.org/jira/browse/ARROW-8541



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617149608


   > BTW: we definitely need continuous benchmark tools to detect these things 
early.
   
   Agreed. Hopefully some progress can be made on this in 2020 since the prior 
discussion in 2019 didn't go anywhere. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-617149858


   I agree with it. Once it is stable, it looks good. Under the development, 
developers want the reader and writer independently. At least, for me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


wesm edited a comment on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379


   Cool, nice improvement (is this captured in our benchmark executables?)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #7000:
URL: https://github.com/apache/arrow/pull/7000#issuecomment-617157131


   https://issues.apache.org/jira/browse/ARROW-8065



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-617235575


   Revision: f543317d36d39322bd339b49dd8867cbd3f2ad70
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-160](https://github.com/ursa-labs/crossbow/branches/all?query=actions-160)
   
   |Task|Status|
   ||--|
   |test-r-linux-as-cran|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-160-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-160-github-test-r-linux-as-cran)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep

2020-04-21 Thread GitBox


nealrichardson commented on issue #6995:
URL: https://github.com/apache/arrow/pull/6995#issuecomment-617234711


   @github-actions crossbow submit test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7000: ARROW-8065: [C++][Dataset] Refactor ScanOptions and Fragment relation

2020-04-21 Thread GitBox


jorisvandenbossche commented on a change in pull request #7000:
URL: https://github.com/apache/arrow/pull/7000#discussion_r412191161



##
File path: python/pyarrow/tests/test_dataset.py
##
@@ -671,41 +669,29 @@ def test_fragments(tempdir):
 f = fragments[0]
 
 # file's schema does not include partition column
-phys_schema = f.schema.remove(f.schema.get_field_index('part'))
-assert f.format.inspect(f.path, f.filesystem) == phys_schema
+assert f.physical_schema.names == ['f1', 'f2']
+assert f.format.inspect(f.path, f.filesystem) == f.physical_schema
 assert f.partition_expression.equals(ds.field('part') == 'a')
 
 # scanning fragment includes partition columns
-result = f.to_table()
-assert f.schema == result.schema
+result = f.to_table(schema=dataset.schema)
 assert result.column_names == ['f1', 'f2', 'part']
-assert len(result) == 4
 assert result.equals(table.slice(0, 4))
-
-# scanning fragments follow column projection
-fragments = list(dataset.get_fragments(columns=['f1', 'part']))

Review comment:
   Keep this but where the columns selection is passed to `to_table` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-21 Thread GitBox


wesm commented on issue #6970:
URL: https://github.com/apache/arrow/pull/6970#issuecomment-617250883


   +1. Appveyor build looks good 
https://ci.appveyor.com/project/wesm/arrow/builds/32336612



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617227451


   @wesm, actually I did use codebase's benchmark executable. The problem is I 
only focused on one case that's directly related to this change. But ignored 
other cases that look not relevant, and finally found impacted.
   Guess I can write a simple script to automate benchmark result checking as a 
temporary solution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-21 Thread GitBox


wesm commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-617228774


   FWIW, we have some benchmark diffing code already written in 
   
   https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark
   
   I'm not sure where this is documented / how to use it to check the output of 
a single executable 
   
   cc @fsaintjacques 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula

2020-04-21 Thread GitBox


nealrichardson commented on issue #6996:
URL: https://github.com/apache/arrow/pull/6996#issuecomment-617232353


   ¯\_(ツ)_/¯ maybe it's time to port this job to GHA



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


kiszk commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385


   I have been thinking about place candidates of the interface between the 
native endian and a PARQUET little-endian. 
   
   One of the good candidates is `Serialize()` in `parquet/column_writer.cc`.  
Another candidate is `TypedBufferBuilder` in `arrow/buffer_builder.h`.
   
   Regarding `Serialize()`, this is because there is [a conversion 
loop](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1781-1783)
 for Decimal128 that uses BigEndian. For big-endian, `Serialize()` of other 
primitive types including int96 needs to have such as conversion loop to 
little-endian. This is the first step.
   
   While the above approach leads to additional overhead, it would be good to 
have new methods
   `AppendLE` and `UnsafeAppendLE` in `TypedBufferBuilder` in addition to 
[`Append()` and 
`UnsafeAppend`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L204-L240].
 These new method ensures to write typed data in little-endian.
   
   I think that we can support big-endian in Parquet using a two-step approach. 
What do you think?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox


pitrou commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r412310009



##
File path: cpp/src/arrow/ipc/metadata_internal.cc
##
@@ -756,10 +737,35 @@ Status FieldFromFlatbuffer(const flatbuf::Field* field, 
DictionaryMemo* dictiona
 RETURN_NOT_OK(IntFromFlatbuffer(int_data, _type));
 ARROW_ASSIGN_OR_RAISE(type,
   DictionaryType::Make(index_type, type, 
encoding->isOrdered()));
-*out = ::arrow::field(field_name, type, field->nullable(), metadata);
-RETURN_NOT_OK(dictionary_memo->AddField(encoding->id(), *out));
-  } else {
-*out = ::arrow::field(field_name, type, field->nullable(), metadata);
+dictionary_id = encoding->id();
+  }
+
+  // 4. Is it an extension type?
+  if (metadata != nullptr) {
+// Look for extension metadata in custom_metadata field
+int name_index = metadata->FindKey(kExtensionTypeKeyName);
+if (name_index != -1) {
+  std::string type_name = metadata->value(name_index);
+  int data_index = metadata->FindKey(kExtensionMetadataKeyName);
+  std::string type_data = data_index == -1 ? "" : 
metadata->value(data_index);
+
+  std::shared_ptr ext_type = GetExtensionType(type_name);
+  if (ext_type != nullptr) {
+ARROW_ASSIGN_OR_RAISE(type, ext_type->Deserialize(type, type_data));
+// Remove the metadata, for faithful roundtripping
+RETURN_NOT_OK(metadata->DeleteMany({name_index, data_index}));

Review comment:
   @wesm Do you opine to this?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox


github-actions[bot] commented on issue #7001:
URL: https://github.com/apache/arrow/pull/7001#issuecomment-617271478


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] davidanthoff opened a new pull request #7001: Use lowercase ws2_32 everywhere

2020-04-21 Thread GitBox


davidanthoff opened a new pull request #7001:
URL: https://github.com/apache/arrow/pull/7001


   With this patch I can cross-compile arrow from a Linux system, in particular 
I can compile Windows binaries on a Linux system (using 
https://binarybuilder.org/). I hope to eventually be able to use things from 
Julia with this.
   
   My best guess is that the inconsistent casing of `ws2_32` in the various 
build files/systems is no problem when compiling things on Windows because file 
systems there tend to be case insensitive.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on a change in pull request #6959: ARROW-5649: [Integration][C++] Create integration test for extension types

2020-04-21 Thread GitBox


pitrou commented on a change in pull request #6959:
URL: https://github.com/apache/arrow/pull/6959#discussion_r412305089



##
File path: dev/archery/archery/integration/datagen.py
##
@@ -1401,6 +1437,18 @@ def generate_nested_dictionary_case():
   dictionaries=[dict0, dict1, dict2])
 
 
+def generate_extension_case():
+uuid_type = ExtensionType('uuid', 'uuid-serialization',
+  FixedSizeBinaryField('', 16))
+
+fields = [
+ExtensionField('uuids', uuid_type),

Review comment:
   Ok, it was more involved than I imagined, because the IPC layer had to 
be fixed as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6985: ARROW-8413: [C++][WIP] Refactor Generating validity bitmap for values column

2020-04-19 Thread GitBox


github-actions[bot] commented on issue #6985:
URL: https://github.com/apache/arrow/pull/6985#issuecomment-616252267


   https://issues.apache.org/jira/browse/ARROW-8413



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6989:
URL: https://github.com/apache/arrow/pull/6989#issuecomment-616395745


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take

2020-04-20 Thread GitBox


jorisvandenbossche commented on issue #6970:
URL: https://github.com/apache/arrow/pull/6970#issuecomment-616366094


   Should we document this in the slice docstring that if the step is not 1, it 
will be a copy (take) and not a zero-copy view? (as I think people will 
typically assume no copy when slicing)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold edited a comment on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold edited a comment on issue #6980:
URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880


   I built the docker image locally and ran the same script as the CI, however, 
I am unable to reproduce the linker error... The ursabot issue seems to have 
fixed itself though, which is good I guess, but I'm probably going to need a 
hand with diagnosing the debian CI issue



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox


pitrou commented on issue #6981:
URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899


   Hmm, I don't think that's right. `Int96` is the physical representation of 
96-bit integers in Parquet files, and it's entirely little-endian. This means 
it should always have the same bit-representation, regardless of the platform's 
endianness.
   
   I think there are several places that need to be fixed:
   * the `Int96` tests in `parquet/arrow/arrow_reader_writer_test.cc`
   * the various `ToImpalaTimestamp` conversion functions in 
`parquet/column_writer.h`
   * the various `Int96` helper functions in `parquet/types.h`
   
   (I may be missing one or more)
   
   Note that `Int96` types are deprecated, so you may not want to lose your 
sweat over this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on a change in pull request #6980:
URL: https://github.com/apache/arrow/pull/6980#discussion_r411174949



##
File path: rust/arrow/src/array/builder.rs
##
@@ -301,6 +324,21 @@ impl BufferBuilderTrait for 
BufferBuilder {
 Ok(())
 }
 
+fn append_n( self, n: usize, v: bool) -> Result<()> {
+self.reserve(n)?;
+if v {
+unsafe {
+bit_util::set_bits_raw(
+self.buffer.raw_data() as *mut u8,

Review comment:
   Changed, and fixed the others in the same file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions

2020-04-20 Thread GitBox


github-actions[bot] commented on issue #6988:
URL: https://github.com/apache/arrow/pull/6988#issuecomment-616358473


   https://issues.apache.org/jira/browse/ARROW-8524



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on issue #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0

2020-04-20 Thread GitBox


jorisvandenbossche commented on issue #6961:
URL: https://github.com/apache/arrow/pull/6961#issuecomment-616389505


   > wheels-linux: 3.8 has a test failure (test_construct_from_list_of_files); 
François says he's seen this elsewhere. @jorisvandenbossche @kszucs is this 
another non-deterministic dataset test?
   
   @nealrichardson looks like it, yes (this was a newly introduced test) -> 
https://github.com/apache/arrow/pull/6989



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche opened a new pull request #6989: [Python] Fix non-deterministic row order failure in dataset tests

2020-04-20 Thread GitBox


jorisvandenbossche opened a new pull request #6989:
URL: https://github.com/apache/arrow/pull/6989


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance

2020-04-20 Thread GitBox


tustvold commented on issue #6980:
URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880


   I built the docker image locally and ran the same script as the CI, however, 
I am unable to reproduce the linker error... The ursabot issue seems to have 
fixed itself though, which is good I guess, but I'm probably going to need a 
hand with this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox


kou commented on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616273563


   @github-actions crossbow submit debian-buster-amd64



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-19 Thread GitBox


cyb70289 commented on a change in pull request #6954:
URL: https://github.com/apache/arrow/pull/6954#discussion_r411062264



##
File path: cpp/cmake_modules/DefineOptions.cmake
##
@@ -101,7 +101,6 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL 
"${CMAKE_CURRENT_SOURCE_DIR}")
   define_option_string(ARROW_SIMD_LEVEL
"SIMD compiler optimization level"
"SSE4_2" # default to SSE4.2
-   "NONE"

Review comment:
   Thanks for review. NONE restored.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects

2020-04-19 Thread GitBox


kou commented on issue #6983:
URL: https://github.com/apache/arrow/pull/6983#issuecomment-616287781


   @github-actions crossbow submit -g linux -g linux-arm



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader

2020-04-19 Thread GitBox


cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616294784


   I forgot to add jira no in the first commit, modified later. Looks jira 
status is not synced with this PR.
   Shall I abandon and push a new PR?
   https://issues.apache.org/jira/browse/ARROW-8523



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jianxind commented on a change in pull request #6954: ARROW-8440: [C++] Refine SIMD header files

2020-04-19 Thread GitBox


jianxind commented on a change in pull request #6954:
URL: https://github.com/apache/arrow/pull/6954#discussion_r411045620



##
File path: cpp/cmake_modules/DefineOptions.cmake
##
@@ -101,7 +101,6 @@ if("${CMAKE_SOURCE_DIR}" STREQUAL 
"${CMAKE_CURRENT_SOURCE_DIR}")
   define_option_string(ARROW_SIMD_LEVEL
"SIMD compiler optimization level"
"SSE4_2" # default to SSE4.2
-   "NONE"

Review comment:
   I personally prefer to keep the NONE(zero) level here though it may 
duplicate to ARROW_USE_SIMD. Level usually start from zero.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   7   8   9   10   >