[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359 I've started a discussion on the [mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make other people aware of your efforts. I wonder if creating a `LETypedBufferBuilder` would make more sense than adding `AppendLE` methods (I don't think it makes sense to use two different endiannesses in a single buffer). It should probably be discussed on the ML. As for `Serialize`, I can't really tell you. Parquet encoding routines seem a bit all over the place (most of it happens in `encoding.cc`). Perhaps other developers can chime in... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
github-actions[bot] commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616667220 https://issues.apache.org/jira/browse/ARROW-8477 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616759329 Thank you for staring the discussion. I will watch at the thread. Yeah, `LETypedBufferBuilder` makes sense. It looks better than adding `AppendLE`. Regarding `Serialize`, it looks a good place where the class has both types of an element for Arrow and Parquet. But, encoding (i.e RLE and others) happens in `encoding.cc`. Let me check it tomorrow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] vertexclique commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance
vertexclique commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411501620 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_slice(), 1) } +default fn append_n( self, n: usize, v: T::Native) -> Result<()> { +self.reserve(n)?; +for _ in 0..n { +self.write_bytes(v.to_byte_slice(), 1)?; +} Review comment: Any preference to not to use an iterator here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
kiszk commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411585778 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t dictionary_length, T* values, int batch_size) { + using IndexType = int32_t; Review comment: For Parquet use case, the max width of the index is 32 based on https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tpboudreau commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
tpboudreau commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your changes look good. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on a change in pull request #6323: ARROW-7610: [Java] Finish support for 64 bit int allocations
BryanCutler commented on a change in pull request #6323: URL: https://github.com/apache/arrow/pull/6323#discussion_r411513297 ## File path: java/memory/src/main/java/org/apache/arrow/memory/NettyAllocationManager.java ## @@ -34,31 +33,24 @@ static final UnsafeDirectLittleEndian EMPTY = INNER_ALLOCATOR.empty; static final long CHUNK_SIZE = INNER_ALLOCATOR.getChunkSize(); - private final int allocatedSize; - private final UnsafeDirectLittleEndian memoryChunk; + private final long allocatedSize; - NettyAllocationManager(BaseAllocator accountingAllocator, int requestedSize) { -super(accountingAllocator); -this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize); Review comment: I don't think we should remove this which effectively replaces all allocations done in Arrow Java, which is a big change. `INNER_ALLOCATOR` also uses a pool which has some benefits. Instead, can you just change `requestedSize` to be a long, then check if it is over the max Int size and only then use `PlatformDependent.allocateMemory`? ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.memory; + +import static org.junit.Assert.assertEquals; + +import io.netty.buffer.ArrowBuf; + +/** + * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}. + * To run this test, please + *Make sure there are 4GB memory available in the system. + * + * Make sure the default allocation manager type is unsafe. Review comment: please update ## File path: java/memory/src/test/java/org/apache/arrow/memory/TestLargeArrowBuf.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.memory; + +import static org.junit.Assert.assertEquals; + +import io.netty.buffer.ArrowBuf; + +/** + * Integration test for large (more than 2GB) {@link io.netty.buffer.ArrowBuf}. + * To run this test, please + *Make sure there are 4GB memory available in the system. + * + * Make sure the default allocation manager type is unsafe. + * This can be achieved by the environmental variable or system property. + * The details can be found in {@link DefaultAllocationManagerOption}. + * + */ +public class TestLargeArrowBuf { + + private static void testLargeArrowBuf() { +final long bufSize = 4 * 1024 * 1024 * 1024L; +try (BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE); + ArrowBuf largeBuf = allocator.buffer(bufSize)) { + assertEquals(bufSize, largeBuf.capacity()); + System.out.println("Successfully allocated a buffer with capacity " + largeBuf.capacity()); + + for (long i = 0; i < bufSize / 8; i++) { +largeBuf.setLong(i * 8, i); + +if ((i + 1) % 1 == 0) { + System.out.println("Successfully written " + (i + 1) + " long words"); +} + } + System.out.println("Successfully written " + (bufSize / 8) + " long words"); + + for (long i = 0; i < bufSize / 8; i++) { +long val = largeBuf.getLong(i * 8); +assertEquals(i, val); + +if ((i + 1) % 1 == 0) { + System.out.println("Successfully read " + (i + 1) + " long words"); +} + } + System.out.println("Successfully read " + (bufSize / 8) + " long words"); +} +
[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616703869 Looks like Windows long paths are enabled by default on Github Actions. Cool! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
pitrou commented on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616748578 The remaining CI failure is unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tpboudreau commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
tpboudreau commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411569025 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF")); ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn)); ASSERT_TRUE(created); + +#ifndef __APPLE__ Review comment: I experienced failures in the CI pipeline on macOS and I was unable to locate clear documentation of the path name limits (I'm not a macOS expert). I figured it might be best to separately address macOS in another issue, if there's community support for that. (This patch leaves macOS builds and tests unchanged.) If you believe this test should run as is under macOS, I'll remove the #ifndef and follow up on any issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tpboudreau opened a new pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
tpboudreau opened a new pull request #6993: URL: https://github.com/apache/arrow/pull/6993 This patch enables reading/writing of files with long (>260 characters) pathnames in Windows. In order for the new test to run under Windows, both (1) the test host must have long paths enabled in its registry, and (2) the test executable (arrow_utility_test.exe) must include a manifest indicating support for long paths (see https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#enable-long-paths-in-windows-10-version-1607-and-later). The test source code checks for (1) and the cmake file changes ensure (2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411547763 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF")); ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn)); ASSERT_TRUE(created); + +#ifndef __APPLE__ Review comment: Why did you have to disable this test on macOS? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
pitrou commented on a change in pull request #6993: URL: https://github.com/apache/arrow/pull/6993#discussion_r411570280 ## File path: cpp/src/arrow/util/io_util_test.cc ## @@ -446,6 +446,56 @@ TEST(CreateDirTree, Basics) { ASSERT_OK_AND_ASSIGN(fn, temp_dir->path().Join("EF")); ASSERT_OK_AND_ASSIGN(created, CreateDirTree(fn)); ASSERT_TRUE(created); + +#ifndef __APPLE__ Review comment: I was just wondering. According to Google searches, the path length limit on macOS may be 1024, which this test exceeds. We can keep it like that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tpboudreau edited a comment on issue #6993: ARROW-8477: [C++] Enable reading and writing of long filenames for Windows
tpboudreau edited a comment on issue #6993: URL: https://github.com/apache/arrow/pull/6993#issuecomment-616714142 Your fixups look good. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
emkornfield commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411577540 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t dictionary_length, T* values, int batch_size) { + using IndexType = int32_t; Review comment: might be worth a comment here and below why IndexType is always static. Would it be possible to add a unit test with a larger value that would have failed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616716254 [AMD64 Conda Crossbow Submit (#101910)](https://ci.ursalabs.org/#builders/98/builds/641) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015 Submitted crossbow builds: [ursa-labs/crossbow @ ursabot-572](https://github.com/ursa-labs/crossbow/branches/all?query=ursabot-572) |Task|Status| ||--| |gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-572-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616715997 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411578282 ## File path: cpp/src/arrow/util/rle_encoding.h ## @@ -414,6 +414,8 @@ static inline bool IndexInRange(int32_t idx, int32_t dictionary_length) { template inline int RleDecoder::GetBatchWithDict(const T* dictionary, int32_t dictionary_length, T* values, int batch_size) { + using IndexType = int32_t; Review comment: You mean a larger type, or an index larger than 2**32? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411675427 ## File path: rust/arrow/src/array/builder.rs ## @@ -236,6 +251,14 @@ impl BufferBuilderTrait for BufferBuilder { self.write_bytes(v.to_byte_slice(), 1) } +default fn append_n( self, n: usize, v: T::Native) -> Result<()> { +self.reserve(n)?; +for _ in 0..n { +self.write_bytes(v.to_byte_slice(), 1)?; +} Review comment: I'm not sure I understand what you mean? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds
bkietz commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616819386 @github-actions crossbow submit -g nightly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] bkietz opened a new pull request #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds
bkietz opened a new pull request #6994: URL: https://github.com/apache/arrow/pull/6994 Add a `status.json` to the gh-pages summary of nightly builds to get around rate limiting This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887183 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820980 https://issues.apache.org/jira/browse/ARROW-8043 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on issue #6988: ARROW-8524: [CI] Free up space on github actions
kou commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616819824 Wow! Awesome! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6994: ARROW-8043: [Developer][CI] Provide better visibility for nightly builds
github-actions[bot] commented on issue #6994: URL: https://github.com/apache/arrow/pull/6994#issuecomment-616820253 Revision: 89cf7325ab761a35b0c8a0da7096805984e18435 Submitted crossbow builds: [ursa-labs/crossbow @ actions-156](https://github.com/ursa-labs/crossbow/branches/all?query=actions-156) |Task|Status| ||--| |centos-6-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-6-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-6-amd64)| |centos-7-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-7-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-7-amd64)| |centos-8-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-centos-8-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-centos-8-amd64)| |conda-linux-gcc-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py36)| |conda-linux-gcc-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py37)| |conda-linux-gcc-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-linux-gcc-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-linux-gcc-py38)| |conda-osx-clang-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py36)| |conda-osx-clang-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py37)| |conda-osx-clang-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-osx-clang-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-osx-clang-py38)| |conda-win-vs2015-py36|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py36)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py36)| |conda-win-vs2015-py37|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py37)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py37)| |conda-win-vs2015-py38|[![Azure](https://dev.azure.com/ursa-labs/crossbow/_apis/build/status/ursa-labs.crossbow?branchName=actions-156-azure-conda-win-vs2015-py38)](https://dev.azure.com/ursa-labs/crossbow/_build/latest?definitionId=1=actions-156-azure-conda-win-vs2015-py38)| |debian-buster-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-buster-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-buster-amd64)| |debian-stretch-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-156-github-debian-stretch-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-156-github-debian-stretch-amd64)| |gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |homebrew-r-autobrew|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-156-travis-homebrew-r-autobrew.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |test-conda-cpp|[![Github
[GitHub] [arrow] kou commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects
kou commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616869503 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep
nealrichardson commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881056 @github-actions crossbow submit test-r-linux-as-cran This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616881426 Revision: 1ed83aaf5dd17d4e3b31aa1cc657f1220da2c8d4 Submitted crossbow builds: [ursa-labs/crossbow @ actions-157](https://github.com/ursa-labs/crossbow/branches/all?query=actions-157) |Task|Status| ||--| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-157-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-157-github-test-r-linux-as-cran)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #6995: WIP DO NOT MERGE 0.17.0 R release prep
nealrichardson opened a new pull request #6995: URL: https://github.com/apache/arrow/pull/6995 Having some trouble/slowness with r-hub for testing so made this PR to use crossbow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6995: WIP DO NOT MERGE 0.17.0 R release prep
github-actions[bot] commented on issue #6995: URL: https://github.com/apache/arrow/pull/6995#issuecomment-616887542 Revision: 88c0198d775796d5a39644a22840a45470b4253f Submitted crossbow builds: [ursa-labs/crossbow @ actions-158](https://github.com/ursa-labs/crossbow/branches/all?query=actions-158) |Task|Status| ||--| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-158-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-158-github-test-r-linux-as-cran)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616921669 Opened a jira card https://issues.apache.org/jira/browse/ARROW-8537 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927551 Revision: 69081241244da5decee0bf0ea3cb2f24059d244d Submitted crossbow builds: [ursa-labs/crossbow @ actions-159](https://github.com/ursa-labs/crossbow/branches/all?query=actions-159) |Task|Status| ||--| |homebrew-cpp|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/actions-159-travis-homebrew-cpp.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula
nealrichardson commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616927211 @github-actions crossbow submit homebrew-cpp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula
nealrichardson opened a new pull request #6996: URL: https://github.com/apache/arrow/pull/6996 One more I didn't remove in ARROW-8222. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6996: ARROW-8538: [Packaging] Remove boost from homebrew formula
github-actions[bot] commented on issue #6996: URL: https://github.com/apache/arrow/pull/6996#issuecomment-616931067 https://issues.apache.org/jira/browse/ARROW-8538 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
cyb70289 commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616915079 @pitrou @wesm Oops, I only checked case "BitmapReader" from benchmark [arrow-bit-util-benchmark](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit_util_benchmark.cc). Obviously it's not enough. I compared all cases just now and see huge performance drop from below 4 tests: Before this patch: ```bash BenchmarkBitmapAnd/32768/1 563496 ns 563260 ns 1243 bytes_per_second=55.4806M/s BenchmarkBitmapAnd/131072/1 2219810 ns 2218984 ns 318 bytes_per_second=56.3321M/s BenchmarkBitmapAnd/32768/2 561738 ns 561467 ns 1265 bytes_per_second=55.6577M/s BenchmarkBitmapAnd/131072/2 2246229 ns 2245119 ns 305 bytes_per_second=55.6763M/s ``` After this patch: ```bash BenchmarkBitmapAnd/32768/11653467 ns 1652680 ns 422 bytes_per_second=18.9087M/s BenchmarkBitmapAnd/131072/1 6665501 ns 6661561 ns 105 bytes_per_second=18.7644M/s BenchmarkBitmapAnd/32768/21670793 ns 1670246 ns 423 bytes_per_second=18.7098M/s BenchmarkBitmapAnd/131072/2 6702369 ns 6698957 ns 103 bytes_per_second=18.6596M/s ``` Before reverting this patch, I would like to understand why it happens. BTW: we definitely need continuous benchmark tools to detect these things early. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616623385 I have been thinking about place candidates of the interface between the native endian and a PARQUET little-endian. One of the good candidates is `Serialize()` in `parquet/column_writer.cc`. Another candidate is `TypedBufferBuilder` in `arrow/buffer_builder.h`. Regarding `Serialize()`, this is because there is [a conversion loop](https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1781-1783) for Decimal128 that uses BigEndian. For big-endian, `Serialize()` of other primitive types including int96 needs to have such as conversion loop to little-endian. This is the first step. While the above approach leads to additional overhead, it would be good to have new methods `AppendLE` and `UnsafeAppendLE` in `TypedBufferBuilder` in addition to [`Append()` and `UnsafeAppend`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L204-L240]. These new method ensures to write typed data in little-endian. I think that we can support big-endian in Parquet using a two-step approach. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6989: [Python] Fix non-deterministic row order failure in dataset tests
github-actions[bot] commented on issue #6989: URL: https://github.com/apache/arrow/pull/6989#issuecomment-616395745 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #6970: ARROW-2714: [Python] Implement variable step slicing with Take
jorisvandenbossche commented on issue #6970: URL: https://github.com/apache/arrow/pull/6970#issuecomment-616366094 Should we document this in the slice docstring that if the step is not 1, it will be a copy (take) and not a zero-copy view? (as I think people will typically assume no copy when slicing) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tustvold edited a comment on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance
tustvold edited a comment on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though, which is good I guess, but I'm probably going to need a hand with diagnosing the debian CI issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899 Hmm, I don't think that's right. `Int96` is the physical representation of 96-bit integers in Parquet files, and it's entirely little-endian. This means it should always have the same bit-representation, regardless of the platform's endianness. I think there are several places that need to be fixed: * the `Int96` tests in `parquet/arrow/arrow_reader_writer_test.cc` * the various `ToImpalaTimestamp` conversion functions in `parquet/column_writer.h` * the various `Int96` helper functions in `parquet/types.h` (I may be missing one or more) Note that `Int96` types are deprecated, so you may not want to lose your sweat over this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tustvold commented on a change in pull request #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance
tustvold commented on a change in pull request #6980: URL: https://github.com/apache/arrow/pull/6980#discussion_r411174949 ## File path: rust/arrow/src/array/builder.rs ## @@ -301,6 +324,21 @@ impl BufferBuilderTrait for BufferBuilder { Ok(()) } +fn append_n( self, n: usize, v: bool) -> Result<()> { +self.reserve(n)?; +if v { +unsafe { +bit_util::set_bits_raw( +self.buffer.raw_data() as *mut u8, Review comment: Changed, and fixed the others in the same file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616358473 https://issues.apache.org/jira/browse/ARROW-8524 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0
jorisvandenbossche commented on issue #6961: URL: https://github.com/apache/arrow/pull/6961#issuecomment-616389505 > wheels-linux: 3.8 has a test failure (test_construct_from_list_of_files); François says he's seen this elsewhere. @jorisvandenbossche @kszucs is this another non-deterministic dataset test? @nealrichardson looks like it, yes (this was a newly introduced test) -> https://github.com/apache/arrow/pull/6989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche opened a new pull request #6989: [Python] Fix non-deterministic row order failure in dataset tests
jorisvandenbossche opened a new pull request #6989: URL: https://github.com/apache/arrow/pull/6989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tustvold commented on issue #6980: ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance
tustvold commented on issue #6980: URL: https://github.com/apache/arrow/pull/6980#issuecomment-616401880 I built the docker image locally and ran the same script as the CI, however, I am unable to reproduce the linker error... The ursabot issue seems to have fixed itself though, which is good I guess, but I'm probably going to need a hand with this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] zhztheplayer commented on issue #6967: ARROW-8499: [C++][Dataset] In ScannerBuilder, batch_size will not wor…
zhztheplayer commented on issue #6967: URL: https://github.com/apache/arrow/pull/6967#issuecomment-616345112 OK but as the PR is already merged, maybe a follow-up JIRA ticket is needed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616356603 @github-actions crossbow submit ubuntu-bionic-amd64 test-conda-cpp test-r-linux-as-cran This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6988: [CI] Try to free up space on github actions [WIP]
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616334547 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs opened a new pull request #6988: [CI] Try to free up space on github actions [WIP]
kszucs opened a new pull request #6988: URL: https://github.com/apache/arrow/pull/6988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6988: ARROW-8524: [CI] Free up space on github actions
github-actions[bot] commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616357456 Revision: f19af84b7b6af216b91a56956590ebce051b69c7 Submitted crossbow builds: [ursa-labs/crossbow @ actions-155](https://github.com/ursa-labs/crossbow/branches/all?query=actions-155) |Task|Status| ||--| |test-conda-cpp|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-test-conda-cpp)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-test-conda-cpp)| |test-r-linux-as-cran|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-test-r-linux-as-cran)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-test-r-linux-as-cran)| |ubuntu-bionic-amd64|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-155-github-ubuntu-bionic-amd64)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-155-github-ubuntu-bionic-amd64)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it in a little-endian. ~~I just realized it that int32 in [int64, int32] is still a little-endian. After one or two hours, I will update this PR.~~ I will be back to here after one hour. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #6961: ARROW-8517: [Release] Update Crossbow release verification tasks for 0.17.0 RC0
kszucs commented on a change in pull request #6961: URL: https://github.com/apache/arrow/pull/6961#discussion_r411236556 ## File path: dev/tasks/verify-rc/github.nix.yml ## @@ -64,8 +69,9 @@ jobs: fi if [ "$TEST_RUBY" = "1" ]; then ruby --version + sudo gem install bundler fi fi - # TODO: put 0.16.0 2 in some separate file? + # TODO: put version and rc number in some separate file? Review comment: Not yet, created jira https://issues.apache.org/jira/browse/ARROW-8525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
pitrou commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616468493 @cyb70289 It's ok, we can keep this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects
kszucs commented on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB(https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github actions, and we can free up more if required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on issue #6983: ARROW-8519: [C++][Packaging] Reduce disk usage for external projects
kszucs edited a comment on issue #6983: URL: https://github.com/apache/arrow/pull/6983#issuecomment-616439081 @kou we don't need to move to travis, I [managed to free up 23GB](https://github.com/apache/arrow/commit/b20f7091e63684804cb6ba76e4f72fcd38040cfd) of additional space on github actions, and we can free up more if required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. I thought that each primitive type should be represented in a little-endian as shown [here](https://github.com/apache/parquet-format/blob/master/Encodings.md) in **Parquet serialized form (e.g. file)**. For example, int32 is kept using a native endian in a memory or register. It will be converted from/to a little-endian when it will be serialized. I extended this to Int96. This is a background of the current decision. Both approaches can work functionally correct. I understand that this is an implementation decision. It is ok with your approach. At that time, I have one question. How will we handle other primitive types (e.g. int32 or double)? Do we keep other primitives in entirely little-endian on both platforms (little-endian and big-endian CPUs)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616456984 I think data should be kept in native endianness in memory (that is what the user would expect). What we must be careful is that Parquet data is encoded (and decoded) as little endian. Your PR doesn't address that, AFAICT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on issue #6988: ARROW-8524: [CI] Free up space on github actions
kszucs commented on issue #6988: URL: https://github.com/apache/arrow/pull/6988#issuecomment-616432127 Build failures are unrelated. +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616453579 I understand your point. First, I implemented the approach ` entirely little-endian`. Then, I reconsidered it. Each primitive type should be represented in a little-endian as shown [here](https://github.com/apache/parquet-format/blob/master/Encodings.md) in **Parquet serialized form (e.g. file)**. For example, int32 is kept using a native endian in a memory or register. It will be converted from/to a little-endian when it will be serialized. I extended this to Int96. This is a background of the current decision. Both approaches can work functionally correct. I understand that this is an implementation decision. It is ok with your approach. At that time, I have one question. How will we handle other primitive types (e.g. int32 or double)? Do we keep other primitives in entirely little-endian on both platforms (little-endian and big-endian CPUs)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616462606 Thank you for your clarification. Based on this, `Int96` structure in memory can be represented in native endian. When it will be written into a file, we carefully have to keep it in a little-endian. I just realized it that int32 in [int64, int32] is still a little-endian. After one or two hours, I will update this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pprudhvi opened a new pull request #6990: ARROW-???? : fix gandiva macos build
pprudhvi opened a new pull request #6990: URL: https://github.com/apache/arrow/pull/6990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pprudhvi commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build
pprudhvi commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521137 @ursabot crossbow submit -g gandiva This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ursabot commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build
ursabot commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616521423 [AMD64 Conda Crossbow Submit (#101851)](https://ci.ursalabs.org/#builders/98/builds/640) builder has been succeeded. Revision: a051a430c8dfc9d0cea307a3d0dcb23e6efc2015 Submitted crossbow builds: [ursa-labs/crossbow @ ursabot-571](https://github.com/ursa-labs/crossbow/branches/all?query=ursabot-571) |Task|Status| ||--| |gandiva-jar-osx|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-571-travis-gandiva-jar-osx.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| |gandiva-jar-xenial|[![TravisCI](https://img.shields.io/travis/ursa-labs/crossbow/ursabot-571-travis-gandiva-jar-xenial.svg)](https://travis-ci.org/ursa-labs/crossbow/branches)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616530174 Can you add the explanation you gave above (about the memory layout) somewhere in `parquet/types.h`? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616533191 I thought we had discussed removing the `ARROW_USE_SIMD` option altogether This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
wesm commented on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm edited a comment on issue #6986: ARROW-8523: [C++] Optimize BitmapReader
wesm edited a comment on issue #6986: URL: https://github.com/apache/arrow/pull/6986#issuecomment-616536379 Cool, nice improvement (is this captures in our benchmark executables?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR address only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and implementation like this. In this change, int96, which is a pair of uint64 and uint32, is stored in memory as follows: ``` For little-endian Int96 Int96_min = {{1024, 2048, 4096}}; 00 04 00 00 00 08 00 00 00 10 00 00 ~~~ ~~~ uint64 (LE) uint32 (LE) For big-endian, Int96 Int96_min = {{2048, 1024, 4096}}; 00 00 08 00 00 00 04 00 00 00 10 00 ~~~ ~~~ uint64 (BE) uint32 (BE) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
pitrou opened a new pull request #6991: URL: https://github.com/apache/arrow/pull/6991 NextCounts() should be parametered with the dictionary index type, not the value type. Previous code seems to have succeeded by chance on little-endian platforms. See discussion in ARROW-8486. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
github-actions[bot] commented on issue #6991: URL: https://github.com/apache/arrow/pull/6991#issuecomment-616529709 https://issues.apache.org/jira/browse/ARROW-8529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411355185 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/NonNullableStructVector.java ## @@ -320,6 +322,20 @@ public int hashCode(int index, ArrowBufHasher hasher) { return visitor.visit(this, value); } + @Override + public void validate() { +if (getValueCount() < 0) { + throw new RuntimeException("vector valueCount is negative"); +} + +if (getNullCount() > getValueCount()) { Review comment: For the Java implementation, since the null count is computed each time, we shoud never expect getNullCount() > getValueCount()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616540298 Sure, at first, I derived this layout from [this method](https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.h#L576). Other resources: https://github.com/apache/arrow/issues/1920#issuecomment-541764279 https://stackoverflow.com/questions/53103762/cast-int96-timestamp-from-parquet-to-golang/53104516#53104516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #6966: ARROW-8497: [Archery] Add missing components to build options
pitrou commented on a change in pull request #6966: URL: https://github.com/apache/arrow/pull/6966#discussion_r411365701 ## File path: dev/archery/archery/cli.py ## @@ -134,36 +134,72 @@ def _apply_options(cmd, options): help="CMake's CMAKE_BUILD_TYPE") @click.option("--warn-level", default="production", type=warn_level_type, help="Controls compiler warnings -W(no-)error.") -# components +@click.option("--use-gold-linker", default=True, type=BOOL, + help="Toggles ARROW_USE_LD_GOLD option.") +# Tests and benchmarks @click.option("--with-tests", default=True, type=BOOL, help="Build with tests.") -@click.option("--with-benchmarks", default=False, type=BOOL, +@click.option("--with-benchmarks", default=None, type=BOOL, help="Build with benchmarks.") -@click.option("--with-examples", default=False, type=BOOL, +@click.option("--with-examples", default=None, type=BOOL, help="Build with examples.") -@click.option("--with-python", default=False, type=BOOL, - help="Build with python extension.") -@click.option("--with-parquet", default=False, type=BOOL, - help="Build with parquet file support.") -@click.option("--with-gandiva", default=False, type=BOOL, +@click.option("--with-integration", default=None, type=BOOL, + help="Build with integration test executables.") +# Static checks +@click.option("--use-asan", default=None, type=BOOL, + help="Toggles ARROW_USE_ASAN sanitizer.") +@click.option("--use-tsan", default=None, type=BOOL, + help="Toggles ARROW_USE_TSAN sanitizer.") +@click.option("--use-ubsan", default=None, type=BOOL, + help="Toggles ARROW_USE_UBSAN sanitizer.") +@click.option("--with-fuzzing", default=None, type=BOOL, + help="Toggles ARROW_FUZZING.") +# Components +@click.option("--with-compute", default=None, type=BOOL, + help="Build the arrow compute module.") +@click.option("--with-csv", default=None, type=BOOL, + help="Build the arrow csv parser module.") +@click.option("--with-cuda", default=None, type=BOOL, + help="Build the arrow cuda parser module.") +@click.option("--with-dataset", default=None, type=BOOL, + help="Build the arrow dataset module.") +@click.option("--with-filesystem", default=None, type=BOOL, + help="Build the arrow FileSystem layer.") Review comment: Nit: no need to capitalize "filesystem". ## File path: dev/archery/archery/cli.py ## @@ -134,36 +134,72 @@ def _apply_options(cmd, options): help="CMake's CMAKE_BUILD_TYPE") @click.option("--warn-level", default="production", type=warn_level_type, help="Controls compiler warnings -W(no-)error.") -# components +@click.option("--use-gold-linker", default=True, type=BOOL, + help="Toggles ARROW_USE_LD_GOLD option.") +# Tests and benchmarks @click.option("--with-tests", default=True, type=BOOL, help="Build with tests.") -@click.option("--with-benchmarks", default=False, type=BOOL, +@click.option("--with-benchmarks", default=None, type=BOOL, help="Build with benchmarks.") -@click.option("--with-examples", default=False, type=BOOL, +@click.option("--with-examples", default=None, type=BOOL, help="Build with examples.") -@click.option("--with-python", default=False, type=BOOL, - help="Build with python extension.") -@click.option("--with-parquet", default=False, type=BOOL, - help="Build with parquet file support.") -@click.option("--with-gandiva", default=False, type=BOOL, +@click.option("--with-integration", default=None, type=BOOL, + help="Build with integration test executables.") +# Static checks +@click.option("--use-asan", default=None, type=BOOL, + help="Toggles ARROW_USE_ASAN sanitizer.") +@click.option("--use-tsan", default=None, type=BOOL, + help="Toggles ARROW_USE_TSAN sanitizer.") +@click.option("--use-ubsan", default=None, type=BOOL, + help="Toggles ARROW_USE_UBSAN sanitizer.") +@click.option("--with-fuzzing", default=None, type=BOOL, + help="Toggles ARROW_FUZZING.") +# Components +@click.option("--with-compute", default=None, type=BOOL, + help="Build the arrow compute module.") +@click.option("--with-csv", default=None, type=BOOL, + help="Build the arrow csv parser module.") Review comment: "CSV"? ## File path: dev/archery/archery/cli.py ## @@ -134,36 +134,72 @@ def _apply_options(cmd, options): help="CMake's CMAKE_BUILD_TYPE") @click.option("--warn-level", default="production", type=warn_level_type, help="Controls compiler warnings -W(no-)error.") -# components +@click.option("--use-gold-linker", default=True, type=BOOL, + help="Toggles ARROW_USE_LD_GOLD option.") +# Tests
[GitHub] [arrow] github-actions[bot] commented on issue #6990: ARROW-8528 : [CI][NIGHTLY:gandiva-jar-osx] fix gandiva osx build
github-actions[bot] commented on issue #6990: URL: https://github.com/apache/arrow/pull/6990#issuecomment-616522599 https://issues.apache.org/jira/browse/ARROW-8528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk edited a comment on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian
kiszk edited a comment on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616507292 At first, this PR addresses only test cases. This PR does not address the routines (like `column_writer.h`) yet. To update test cases can clarify the specification and implementation like this. In this change, int96, which is a pair of uint64 and uint32, is stored in memory as follows: ``` For little-endian Int96 Int96_min = {{1024, 2048, 4096}}; 00 04 00 00 00 08 00 00 00 10 00 00 ~~~ ~~~ uint64 (LE) uint32 (LE) For big-endian, Int96 Int96_min = {{2048, 1024, 4096}}; 00 00 08 00 00 00 04 00 00 00 10 00 ~~~ ~~~ uint64 (BE) uint32 (BE) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #6912: ARROW-8020: [Java] Implement vector validate functionality
liyafan82 commented on a change in pull request #6912: URL: https://github.com/apache/arrow/pull/6912#discussion_r411352198 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ValueVector.java ## @@ -283,4 +283,10 @@ * @return the name of the vector. */ String getName(); + + /** + * Validate the vector, will throw exception if validate fail. + */ + void validate(); Review comment: I agree with you that it makes the client code simpler. IMO, the real benefits are 1) one fewer virtual function call; and 2) one fewer stack frame. However, the cost is that it makes the interface larger and difficult to maintain. In addition, adding new APIs to a public interface has the risk of crashing client code. So I think the method of adding new APIs should be reserved for performance-critical operations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche opened a new pull request #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old
jorisvandenbossche opened a new pull request #6992: URL: https://github.com/apache/arrow/pull/6992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old
jorisvandenbossche commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616595450 This is still WIP (depending on which pandas version we choose, we can clean up some things in the pandas-shim.pxi), but: - I defined the minimal pandas version now as pandas 0.23 (released 2 years ago). But, I also ran tests with pandas 0.22 and pandas 0.21, and tests are still passing for those. So we *could* also set this minimal version at 0.21 (the mailing list thread that triggered this issue reported an error on pandas 0.20). But personally, I think 0.23 is more than fine. - I now raise an error when the pandas version is too low. But, since it is actually working (mostly) for lower versions as well, we could also only give a warning instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on issue #6992: ARROW-7950: [Python] Determine + test minimal pandas version + raise error when pandas is too old
github-actions[bot] commented on issue #6992: URL: https://github.com/apache/arrow/pull/6992#issuecomment-616601667 https://issues.apache.org/jira/browse/ARROW-7950 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files
wesm commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616604745 Maybe it was just a thought I had in my head but never expressed. Opened https://issues.apache.org/jira/browse/ARROW-8531 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #6991: ARROW-8529: [C++] Fix usage of NextCounts() on dictionary-encoded data
pitrou commented on a change in pull request #6991: URL: https://github.com/apache/arrow/pull/6991#discussion_r411416006 ## File path: .github/workflows/cpp.yml ## @@ -228,7 +228,9 @@ jobs: run: ci/scripts/util_checkout.sh - name: Build shell: bash -run: ci/scripts/cpp_build.sh $(pwd) $(pwd)/build +run: | + export BOOST_ROOT=$BOOST_ROOT_1_72_0 + ci/scripts/cpp_build.sh $(pwd) $(pwd)/build Review comment: This is an unrelated CI fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on issue #6954: ARROW-8440: [C++] Refine SIMD header files
cyb70289 commented on issue #6954: URL: https://github.com/apache/arrow/pull/6954#issuecomment-616585651 > I thought we had discussed removing the `ARROW_USE_SIMD` option altogether @wesm remove `ARROW_USE_SIMD`? I remember a thread about default simd level( https://github.com/apache/arrow/pull/6907#issuecomment-612896453), but not `ARROW_USE_SIMD`. That said, I would like to remove it as it overlaps other setting. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org