[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
kiszk commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450040328 ## File path: cpp/src/arrow/ipc/test_common.cc ## @@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool include_nulls, MemoryPool* pool return Status::OK(); } +template +static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, + std::shared_ptr* out, uint32_t seed) { + random::RandomArrayGenerator rand(seed); + const double null_probability = include_nulls ? 0.5 : 0.0; + + *out = rand.Numeric(length, 0, 1000, null_probability); + + return Status::OK(); +} + +template <> Review comment: specialize for 8-bit data. This is because range(0, 1000) is too big for 8-bit data. Win64 causes hang-up without this specialization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r450041982 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.memory; + +import org.apache.arrow.memory.util.MemoryUtil; + +/** + * The default Allocation Manager Factory for a module. + * + * This will be split out to the arrow-memory-netty/arrow-memory-unsafe modules + * as part of ARROW-8230. This is currently a placeholder which defaults to Netty. + * + */ +public class DefaultAllocationManagerFactory implements AllocationManager.Factory { Review comment: We have 3 classes in the code base named "DefaultAllocationManagerFactory". Do we need all of them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r450071734 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.memory; + +import org.apache.arrow.memory.util.MemoryUtil; + +/** + * The default Allocation Manager Factory for a module. + * + * This will be split out to the arrow-memory-netty/arrow-memory-unsafe modules + * as part of ARROW-8230. This is currently a placeholder which defaults to Netty. + * + */ +public class DefaultAllocationManagerFactory implements AllocationManager.Factory { Review comment: Hey @liyafan82, yes i think we do: 1) lives in the netty module. 2) lives in the unsafe module 3) lives in the tests portion of the core memory module. On startup we first check if a specific memory allocator has been requested. If yes we instantiate that. If no specific impl has been requested then we search the classpath for a `DefaultAllocationManagerFactory`. If we find one then we instantiate it, else we throw a Runtime exception (can't continue without an allocation manager). So each concrete impl of `AllocationManager` requires a factory to create it and we have a trivial allocation manager in `arrow-memory-core` tests to help the core tests finish. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7653: ARROW-9343: [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces
github-actions[bot] commented on pull request #7653: URL: https://github.com/apache/arrow/pull/7653#issuecomment-654606536 https://issues.apache.org/jira/browse/ARROW-9343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] projjal opened a new pull request #7653: ARROW-9343: [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces
projjal opened a new pull request #7653: URL: https://github.com/apache/arrow/pull/7653 Also refactored the code to remove the parse_float helper function from context. This was done earlier to save the constructor cost when the stringconverter object from arrow/util/parsing was needed to be constructed first. Now that it has changed to parametric functions, its not needed to keep them in the executioncontext. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-654234467 @pitrou added the C++ tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove commented on pull request #7176: ARROW-8796: [Rust] feat: Allow writers to use Vec
andygrove commented on pull request #7176: URL: https://github.com/apache/arrow/pull/7176#issuecomment-654252387 @sunchao Did you have a chance to look at the code sample provided? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
pitrou commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-654237098 Looks like the PR needs rebasing and fixing for the latest union changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove closed pull request #7591: ARROW-8535: [Rust] Specify arrow-flight version
andygrove closed pull request #7591: URL: https://github.com/apache/arrow/pull/7591 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove commented on pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype
andygrove commented on pull request #7253: URL: https://github.com/apache/arrow/pull/7253#issuecomment-654248872 I'm going to take another run at this, with smaller changes in new PRs. Thanks for the reviews so far. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove closed pull request #7253: ARROW-4957: [Rust] [DataFusion] Re-implement get_supertype
andygrove closed pull request #7253: URL: https://github.com/apache/arrow/pull/7253 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn opened a new pull request #7643: ARROW-9331: [C++] Improve the conversion performance from Tensor to SparseCOOTensor
mrkn opened a new pull request #7643: URL: https://github.com/apache/arrow/pull/7643 In this pull-request, the slowing down of the conversion introduced in #7539 is canceled, and the conversion speed is improved than before #7539 in some cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options
pitrou commented on pull request #7620: URL: https://github.com/apache/arrow/pull/7620#issuecomment-654273536 Thanks for the review @kou . I believe I've addressed all of your comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] projjal opened a new pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva
projjal opened a new pull request #7642: URL: https://github.com/apache/arrow/pull/7642 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
jorisvandenbossche commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r450201426 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -262,6 +302,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) { PyDateTime_GET_DAY(pydate)); } +// GIL must be held when calling this function. +Status StringToTzinfo(const std::string& tz, PyObject** tzinfo) { Review comment: Looks fine! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7644: ARROW-9330: [C++] Fix crash and undefined behaviour on corrupt IPC input
github-actions[bot] commented on pull request #7644: URL: https://github.com/apache/arrow/pull/7644#issuecomment-654284383 https://issues.apache.org/jira/browse/ARROW-9330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs
github-actions[bot] commented on pull request #7646: URL: https://github.com/apache/arrow/pull/7646#issuecomment-654370419 https://issues.apache.org/jira/browse/ARROW-7237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450383737 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: We can ensure to have an explicit type to the value scalar from `array.GetScalar()` but not when a null union scalar is directly constructed, so `MakeNullScalar(union_type)` won't know which type to choose for the underlying value scalar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message
nealrichardson opened a new pull request #7647: URL: https://github.com/apache/arrow/pull/7647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message
github-actions[bot] commented on pull request #7647: URL: https://github.com/apache/arrow/pull/7647#issuecomment-654385353 https://issues.apache.org/jira/browse/ARROW-9337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects
kszucs commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r450396810 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -612,6 +613,83 @@ def test_make_fragment(multisourcefs): assert row_group_fragment.row_groups == [ds.RowGroupInfo(0)] +def test_make_csv_fragment_from_buffer(): +content = textwrap.dedent(""" +alpha,num,animal +a,12,dog +b,11,cat +c,10,rabbit +""") +buffer = pa.py_buffer(content.encode('utf-8')) + +csv_format = ds.CsvFileFormat() +fragment = csv_format.make_fragment(buffer) + +expected = pa.table([['a', 'b', 'c'], + [12, 11, 10], + ['dog', 'cat', 'rabbit']], +names=['alpha', 'num', 'animal']) +assert fragment.to_table().equals(expected) + +pickled = pickle.loads(pickle.dumps(fragment)) +assert pickled.to_table().equals(fragment.to_table()) + + +@pytest.mark.parquet +def test_make_parquet_fragment_from_buffer(): +import pyarrow.parquet as pq + +cases = [ +( +pa.table( +[ +['a', 'b', 'c'], +[12, 11, 10], +['dog', 'cat', 'rabbit'] Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
BryanCutler commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654366148 Looks like the Spark test fails with ``` File "/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 593, in f "{} != {}".format(expected_key[i][1], window_range) AssertionError: {'start': datetime.datetime(2018, 3, 10, 0, 0), 'end': datetime.datetime(2018, 3, 15, 0, 0)} != {'start': datetime.datetime(2018, 3, 10, 0, 0, tzinfo=), 'end': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=)} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler edited a comment on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
BryanCutler edited a comment on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654366148 Looks like the Spark test fails with ``` File "/spark/python/pyspark/sql/tests/test_pandas_grouped_map.py", line 593, in f "{} != {}".format(expected_key[i][1], window_range) AssertionError: {'start': datetime.datetime(2018, 3, 10, 0, 0), 'end': datetime.datetime(2018, 3, 15, 0, 0)} != {'start': datetime.datetime(2018, 3, 10, 0, 0, tzinfo=), 'end': datetime.datetime(2018, 3, 15, 0, 0, tzinfo=)} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects
jorisvandenbossche commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r450390113 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -612,6 +613,83 @@ def test_make_fragment(multisourcefs): assert row_group_fragment.row_groups == [ds.RowGroupInfo(0)] +def test_make_csv_fragment_from_buffer(): +content = textwrap.dedent(""" +alpha,num,animal +a,12,dog +b,11,cat +c,10,rabbit +""") +buffer = pa.py_buffer(content.encode('utf-8')) + +csv_format = ds.CsvFileFormat() +fragment = csv_format.make_fragment(buffer) + +expected = pa.table([['a', 'b', 'c'], + [12, 11, 10], + ['dog', 'cat', 'rabbit']], +names=['alpha', 'num', 'animal']) +assert fragment.to_table().equals(expected) + +pickled = pickle.loads(pickle.dumps(fragment)) +assert pickled.to_table().equals(fragment.to_table()) + + +@pytest.mark.parquet +def test_make_parquet_fragment_from_buffer(): +import pyarrow.parquet as pq + +cases = [ +( +pa.table( +[ +['a', 'b', 'c'], +[12, 11, 10], +['dog', 'cat', 'rabbit'] Review comment: you could reduce the used vertical space here a bit by only defining the list of arrays here, and do `table = pa.table(arrays, names=['alpha', 'num', 'animal'])` in the loop This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs
pitrou opened a new pull request #7646: URL: https://github.com/apache/arrow/pull/7646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
pitrou commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450371161 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: It the value scalar is not a nullptr, then it would seem better, unless that's complicated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
jorisvandenbossche commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654377721 Which I think is as expected? (at least if the data in the test are tz-aware) And thus the test can be updated on the spark side? Since we actually change to now return datetimes with tzinfo set, instead of losing the timezone. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
kiszk commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450287754 ## File path: cpp/src/arrow/ipc/test_common.cc ## @@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool include_nulls, MemoryPool* pool return Status::OK(); } +template +static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, + std::shared_ptr* out, uint32_t seed) { + random::RandomArrayGenerator rand(seed); + const double null_probability = include_nulls ? 0.5 : 0.0; + + *out = rand.Numeric(length, 0, 1000, null_probability); + + return Status::OK(); +} + +template <> +Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, Review comment: To make this static cause the compilation error. I will put them into the anonymous namespace. ``` /home/ishizaki/Arrow/arrow/cpp/src/arrow/ipc/test_common.cc:94:1: error: explicit template specialization cannot have a storage class static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, ^~ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450317608 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: I think `GetScalar()` should return a Scalar with the same type as the Array. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-654308059 @pitrou updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
kiszk commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450302190 ## File path: cpp/src/arrow/ipc/test_common.cc ## @@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool include_nulls, MemoryPool* pool return Status::OK(); } +template +static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, + std::shared_ptr* out, uint32_t seed) { + random::RandomArrayGenerator rand(seed); + const double null_probability = include_nulls ? 0.5 : 0.0; + + *out = rand.Numeric(length, 0, 1000, null_probability); + + return Status::OK(); +} + +template <> +Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, Review comment: I will just remove `static` like other MakeRandom...Array. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array
github-actions[bot] commented on pull request #7645: URL: https://github.com/apache/arrow/pull/7645#issuecomment-654314571 https://issues.apache.org/jira/browse/ARROW-8374 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
pitrou commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450314430 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: Hmm... interesting. Shouldn't it be `MakeNullScalar(utf8())`? ## File path: cpp/src/arrow/scalar_test.cc ## @@ -80,6 +84,15 @@ TYPED_TEST(TestNumericScalar, Basics) { auto dyn_null_value = MakeNullScalar(expected_type); ASSERT_EQ(*null_value, *dyn_null_value); + + // test Array.GetScalar + auto arr = ArrayFromJSON(expected_type, "[null, 1, 2]"); + ASSERT_OK_AND_ASSIGN(auto null, arr->GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto one, arr->GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto two, arr->GetScalar(2)); + ASSERT_TRUE(null->Equals(*null_value)); + ASSERT_TRUE(one->Equals(ScalarType(1))); + ASSERT_TRUE(two->Equals(ScalarType(2))); Review comment: You never check that `Equals` returns false? ## File path: cpp/src/arrow/scalar.cc ## @@ -185,6 +192,35 @@ DictionaryScalar::DictionaryScalar(std::shared_ptr type) 0) .ValueOrDie()} {} +Result> DictionaryScalar::GetEncodedValue() const { + const auto& dict_type = checked_cast(*type); + + if (!is_valid) { +return MakeNullScalar(dict_type.value_type()); + } + + int64_t index_value = 0; + switch (dict_type.index_type()->id()) { +case Type::INT8: + index_value = checked_cast(*value.index).value; + break; +case Type::INT16: + index_value = checked_cast(*value.index).value; + break; +case Type::INT32: + index_value = checked_cast(*value.index).value; + break; +case Type::INT64: + index_value = checked_cast(*value.index).value; + break; +default: + return Status::Invalid("Not implemented dictionary index type"); Review comment: `Status::TypeError`? ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid,
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450327095 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: We have the value type in `UnionScalar.value.type`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options
pitrou commented on pull request #7620: URL: https://github.com/apache/arrow/pull/7620#issuecomment-654331202 The "AMD64 MacOS 10.15 Python 3.7" CI failure looks unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
pitrou commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450267748 ## File path: cpp/src/arrow/testing/random.h ## @@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator { std::shared_ptr Int64(int64_t size, int64_t min, int64_t max, double null_probability = 0); + /// \brief Generates a random HalfFloatArray + /// + /// \param[in] size the size of the array to generate + /// \param[in] min the lower bound of the uniform distribution + /// \param[in] max the upper bound of the uniform distribution Review comment: Hmm... this will not really be a *uniform* float16 distribution if you're selecting random values for the underlying int16 representation, right? ## File path: cpp/src/arrow/ipc/test_common.cc ## @@ -79,6 +79,39 @@ Status MakeRandomInt32Array(int64_t length, bool include_nulls, MemoryPool* pool return Status::OK(); } +template +static Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, + std::shared_ptr* out, uint32_t seed) { + random::RandomArrayGenerator rand(seed); + const double null_probability = include_nulls ? 0.5 : 0.0; + + *out = rand.Numeric(length, 0, 1000, null_probability); + + return Status::OK(); +} + +template <> +Status MakeRandomArray(int64_t length, bool include_nulls, MemoryPool* pool, Review comment: Make those static as well? (or put them in the anonymous namespace) ## File path: cpp/src/arrow/ipc/feather_test.cc ## @@ -294,6 +325,26 @@ INSTANTIATE_TEST_SUITE_P( TestParam(kFeatherV2Version, Compression::LZ4_FRAME), TestParam(kFeatherV2Version, Compression::ZSTD))); +#define BATCH_CASES() \ Review comment: `using kBatchCases = ::testing::Values(...)` perhaps? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
kiszk commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450283575 ## File path: cpp/src/arrow/testing/random.h ## @@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator { std::shared_ptr Int64(int64_t size, int64_t min, int64_t max, double null_probability = 0); + /// \brief Generates a random HalfFloatArray + /// + /// \param[in] size the size of the array to generate + /// \param[in] min the lower bound of the uniform distribution + /// \param[in] max the upper bound of the uniform distribution Review comment: Good point. It will not be uniform distribution for float16 Should I only update the comment? Or, should I create uniform distribution float16 random while the current std does not support float16? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] romainfrancois opened a new pull request #7645: ARROW-8374 [R]: Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array
romainfrancois opened a new pull request #7645: URL: https://github.com/apache/arrow/pull/7645 This needs some testing: ``` r library(arrow, warn.conflicts = FALSE) f1 <- factor(c("a"), levels = c("a", "b")) f2 <- factor(c("c"), levels = c("c", "d")) ca <- ChunkedArray$create(f1, f2) ca$as_vector() #> [1] a c #> Levels: a b c d ca$type #> DictionaryType #> dictionary tab <- Table$create( record_batch(f = f1), record_batch(f = f2) ) tab #> Table #> 2 rows x 1 columns #> $f > df <- as.data.frame(tab) df #> # A tibble: 2 x 1 #> f #> #> 1 a #> 2 c df$f #> [1] a c #> Levels: a b c d ``` Created on 2020-07-06 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0.9001) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
pitrou commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450319790 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: Ah, I see. Yes, you're probably right. A pity we lose information about the value type, then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
pitrou commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450328412 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: Ah, perhaps check that in the test? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects
jorisvandenbossche commented on pull request #7631: URL: https://github.com/apache/arrow/pull/7631#issuecomment-654334653 I opened https://issues.apache.org/jira/browse/ARROW-9332 for the parquet statistics This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
rok commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r450340186 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names: c_dim_names.push_back(tobytes(x)) -coords = np.vstack([obj.row, obj.col]).T +row = obj.row +col = obj.col +if obj.has_canonical_format: +order = np.lexsort((col, row)) # row-major order Review comment: So column-major is scipy-canonical and row-major is tensorflow canonical? Would we then not rather just introduce a new property e.g.: `has_duplicates`? I would find that less complex. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
rok commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r450325016 ## File path: python/pyarrow/tensor.pxi ## @@ -270,8 +279,10 @@ shape: {0.shape}""".format(self) _data, _coords)) data = PyObject_to_object(out_data) coords = PyObject_to_object(out_coords) -result = coo_matrix((data[:, 0], (coords[:, 0], coords[:, 1])), -shape=self.shape) +row, col = coords[:, 0], coords[:, 1] +result = coo_matrix((data[:, 0], (row, col)), shape=self.shape) +if self.has_canonical_format: +result.sum_duplicates() Review comment: @mrkn - wouldn't is_canonical guarantee no duplicates if indices were row- or column-ordered? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450332885 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: Adding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7643: ARROW-9331: [C++] Improve the performance of Tensor-to-SparseTensor conversion
github-actions[bot] commented on pull request #7643: URL: https://github.com/apache/arrow/pull/7643#issuecomment-654284382 https://issues.apache.org/jira/browse/ARROW-9331 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7555: ARROW-9238: [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
pitrou commented on a change in pull request #7555: URL: https://github.com/apache/arrow/pull/7555#discussion_r450285869 ## File path: cpp/src/arrow/testing/random.h ## @@ -140,6 +140,17 @@ class ARROW_TESTING_EXPORT RandomArrayGenerator { std::shared_ptr Int64(int64_t size, int64_t min, int64_t max, double null_probability = 0); + /// \brief Generates a random HalfFloatArray + /// + /// \param[in] size the size of the array to generate + /// \param[in] min the lower bound of the uniform distribution + /// \param[in] max the upper bound of the uniform distribution Review comment: Updating the comment is enough. As long as the actual values are not processed, it doesn't really matter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
github-actions[bot] commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654294358 Revision: a550a761781e8a7e1ebe163642a1c6fae98dad52 Submitted crossbow builds: [ursa-labs/crossbow @ actions-380](https://github.com/ursa-labs/crossbow/branches/all?query=actions-380) |Task|Status| ||--| |test-conda-python-3.7-spark-master|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-380-github-test-conda-python-3.7-spark-master)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-380-github-test-conda-python-3.7-spark-master)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs edited a comment on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs edited a comment on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-654308059 @pitrou updated. I assume now we propagate the null values from the selected child array, please double check the union tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450317608 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: I think `GetScalar()` should return a Scalar with the same type as the Array, so in case of unions with the union type rather than the type selected by the union type id (the union scalar's value is another scalar with the type id). I don't have a strong opinion on this though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450318972 ## File path: cpp/src/arrow/scalar.cc ## @@ -185,6 +192,35 @@ DictionaryScalar::DictionaryScalar(std::shared_ptr type) 0) .ValueOrDie()} {} +Result> DictionaryScalar::GetEncodedValue() const { + const auto& dict_type = checked_cast(*type); + + if (!is_valid) { +return MakeNullScalar(dict_type.value_type()); + } + + int64_t index_value = 0; + switch (dict_type.index_type()->id()) { +case Type::INT8: + index_value = checked_cast(*value.index).value; + break; +case Type::INT16: + index_value = checked_cast(*value.index).value; + break; +case Type::INT32: + index_value = checked_cast(*value.index).value; + break; +case Type::INT64: + index_value = checked_cast(*value.index).value; + break; +default: + return Status::Invalid("Not implemented dictionary index type"); Review comment: Updating. ## File path: cpp/src/arrow/scalar_test.cc ## @@ -80,6 +84,15 @@ TYPED_TEST(TestNumericScalar, Basics) { auto dyn_null_value = MakeNullScalar(expected_type); ASSERT_EQ(*null_value, *dyn_null_value); + + // test Array.GetScalar + auto arr = ArrayFromJSON(expected_type, "[null, 1, 2]"); + ASSERT_OK_AND_ASSIGN(auto null, arr->GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto one, arr->GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto two, arr->GetScalar(2)); + ASSERT_TRUE(null->Equals(*null_value)); + ASSERT_TRUE(one->Equals(ScalarType(1))); + ASSERT_TRUE(two->Equals(ScalarType(2))); Review comment: Not yet, adding check for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r450344074 ## File path: cpp/src/arrow/scalar_test.cc ## @@ -451,6 +565,127 @@ TEST(TestStructScalar, FieldAccess) { ASSERT_RAISES(Invalid, abc.field(5).status()); ASSERT_RAISES(Invalid, abc.field("c").status()); + + ASSERT_OK_AND_ASSIGN(auto d, abc.field("d")); + ASSERT_TRUE(d->Equals(MakeNullScalar(int64(; +} + +TEST(TestDictionaryScalar, Basics) { + auto ty = dictionary(int8(), utf8()); + auto dict = ArrayFromJSON(utf8(), "[\"alpha\", \"beta\", \"gamma\"]"); + + DictionaryScalar::ValueType alpha; + ASSERT_OK_AND_ASSIGN(alpha.index, MakeScalar(int8(), 0)); + alpha.dictionary = dict; + + DictionaryScalar::ValueType gamma; + ASSERT_OK_AND_ASSIGN(gamma.index, MakeScalar(int8(), 2)); + gamma.dictionary = dict; + + auto scalar_null = MakeNullScalar(ty); + auto scalar_alpha = DictionaryScalar(alpha, ty); + auto scalar_gamma = DictionaryScalar(gamma, ty); + + ASSERT_OK_AND_ASSIGN( + auto encoded_null, + checked_cast(*scalar_null).GetEncodedValue()); + ASSERT_TRUE(encoded_null->Equals(MakeNullScalar(utf8(; + + ASSERT_OK_AND_ASSIGN( + auto encoded_alpha, + checked_cast(scalar_alpha).GetEncodedValue()); + ASSERT_TRUE(encoded_alpha->Equals(MakeScalar("alpha"))); + + ASSERT_OK_AND_ASSIGN( + auto encoded_gamma, + checked_cast(scalar_gamma).GetEncodedValue()); + ASSERT_TRUE(encoded_gamma->Equals(MakeScalar("gamma"))); + + // test Array.GetScalar + DictionaryArray arr(ty, ArrayFromJSON(int8(), "[2, 0, 1, null]"), dict); + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_OK_AND_ASSIGN(auto last, arr.GetScalar(3)); + + ASSERT_TRUE(first->Equals(scalar_gamma)); + ASSERT_TRUE(second->Equals(scalar_alpha)); + ASSERT_TRUE(last->Equals(scalar_null)); +} + +TEST(TestSparseUnionScalar, Basics) { + auto ty = sparse_union({field("string", utf8()), field("number", uint64())}); + + auto alpha = MakeScalar("alpha"); + auto beta = MakeScalar("beta"); + ASSERT_OK_AND_ASSIGN(auto two, MakeScalar(uint64(), 2)); + + auto scalar_alpha = SparseUnionScalar(alpha, ty); + auto scalar_beta = SparseUnionScalar(beta, ty); + auto scalar_two = SparseUnionScalar(two, ty); + + // test Array.GetScalar + std::vector> children{ + ArrayFromJSON(utf8(), "[\"alpha\", \"\", \"beta\", null, \"gamma\"]"), + ArrayFromJSON(uint64(), "[1, 2, 11, 22, null]")}; + + auto type_ids = ArrayFromJSON(int8(), "[0, 1, 0, 0, 1]"); + SparseUnionArray arr(ty, 5, children, type_ids->data()->buffers[1]); + ASSERT_OK(arr.ValidateFull()); + + ASSERT_OK_AND_ASSIGN(auto first, arr.GetScalar(0)); + ASSERT_TRUE(first->Equals(scalar_alpha)); + + ASSERT_OK_AND_ASSIGN(auto second, arr.GetScalar(1)); + ASSERT_TRUE(second->Equals(scalar_two)); + + ASSERT_OK_AND_ASSIGN(auto third, arr.GetScalar(2)); + ASSERT_TRUE(third->Equals(scalar_beta)); + + ASSERT_OK_AND_ASSIGN(auto fourth, arr.GetScalar(3)); + ASSERT_TRUE(fourth->Equals(MakeNullScalar(ty))); Review comment: So for UnionScalar we have two `is_valid` flags, une for the union scalar and one for the underlying value scalar. Currently `MakeNullScalar()` sets the validity flag for the union scalar and leaves the value scalar uninitialized. What do you think, shall we initialize the value scalar as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index
rok commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-654347134 @mrkn - sorry I couldn't spend time on this earlier. I think pyarrow part is ok (except for the already going discussions). Would using the canonical flag make value assignment in [`MakeTensorFromSparseCOOTensor`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/tensor/coo_converter.cc#L224) faster? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects
kszucs commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r450361126 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -635,6 +635,37 @@ def test_make_fragment_from_buffer(): assert pickled.to_table().equals(fragment.to_table()) +@pytest.mark.parquet +def test_make_parquet_fragment_from_buffer(): +import pyarrow.parquet as pq + +table = pa.table([['a', 'b', 'c'], + [12, 11, 10], + ['dog', 'cat', 'rabbit']], + names=['alpha', 'num', 'animal']) + +out = pa.BufferOutputStream() +pq.write_table(table, out) + +buffer = out.getvalue() + +formats = [ +ds.ParquetFileFormat(), +ds.ParquetFileFormat( +read_options=ds.ParquetReadOptions( +use_buffered_stream=True, +buffer_size=4096, Review comment: @jorisvandenbossche updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7642: ARROW-9329: [C++][Gandiva] Implement castTimestampToDate function in gandiva
github-actions[bot] commented on pull request #7642: URL: https://github.com/apache/arrow/pull/7642#issuecomment-654192802 https://issues.apache.org/jira/browse/ARROW-9329 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7635: ARROW-1587: [C++] implement fill null
wesm commented on a change in pull request #7635: URL: https://github.com/apache/arrow/pull/7635#discussion_r449802099 ## File path: cpp/src/arrow/compute/api_scalar.h ## @@ -259,6 +259,27 @@ Result IsValid(const Datum& values, ExecContext* ctx = NULLPTR); ARROW_EXPORT Result IsNull(const Datum& values, ExecContext* ctx = NULLPTR); +struct ARROW_EXPORT FillNullOptions : public FunctionOptions { + explicit FillNullOptions(Datum fill_value) : fill_value(std::move(fill_value)) {} + + Datum fill_value; +}; Review comment: See comments above. I think this should be implemented as a binary function since the "fill values" could be provided by an array. This also will allow the execution layer to (in the near future) insert implicit casts where needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou opened a new pull request #7644: ARROW-9330: [C++] Fix crash and undefined behaviour on corrupt IPC input
pitrou opened a new pull request #7644: URL: https://github.com/apache/arrow/pull/7644 Should fix the following issues: * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23910 * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23916 * https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
jorisvandenbossche commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654267493 @github-actions crossbow submit test-conda-python-3.7-spark-master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface
nealrichardson opened a new pull request #7648: URL: https://github.com/apache/arrow/pull/7648 In terms of number of lines of code, this wasn't bad, though I don't know how efficient these methods are. Maybe there's a better way The one thing that would be lost is any metadata attached to the Table schema because the Table is reconstructed from its ChunkedArrays without schema. I wonder if we could export the Schema on its own--the existing `_export_to_c`/`_import_from_c` methods take both an array pointer and a schema pointer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7620: ARROW-9013: [C++] Validate CMake options
kou commented on pull request #7620: URL: https://github.com/apache/arrow/pull/7620#issuecomment-654445290 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7620: ARROW-9013: [C++] Validate CMake options
kou closed pull request #7620: URL: https://github.com/apache/arrow/pull/7620 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] rok commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices
rok commented on pull request #7044: URL: https://github.com/apache/arrow/pull/7044#issuecomment-654457797 Hey @mrkn - sorry I didn't have capacity to reply for a while. If I remember correctly only SciPy has this architecture of COO index having two vectors instead of one 2D matrix. Question: could we handle this as a special case of COO tensor rather than a new type? Could we serialize `(row, col)` data as a single row major tensor of COO type and only deserialize it into the SciPy layout if desired? (I'm asking because I'm not sure if such approach is feasible) If we go for a new type - could I propose a name SparseCOOMatrix (as opposed to n-dimensional SparseCOOTensor). It could perhaps be shortened to `COOM`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] BryanCutler commented on pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion
BryanCutler commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-654462400 Yeah, it seems possible to fix on the Spark side, but I haven't been able to take a close look at this yet. I'll try to do that as soon as possible and report back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7648: ARROW-8301: [R] Handle ChunkedArray and Table in C data interface
github-actions[bot] commented on pull request #7648: URL: https://github.com/apache/arrow/pull/7648#issuecomment-654431173 https://issues.apache.org/jira/browse/ARROW-8301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7637: ARROW-9137: [GLib] Add gparquet_arrow_file_reader_read_row_group()
kou commented on pull request #7637: URL: https://github.com/apache/arrow/pull/7637#issuecomment-654434362 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7637: ARROW-9137: [GLib] Add gparquet_arrow_file_reader_read_row_group()
kou closed pull request #7637: URL: https://github.com/apache/arrow/pull/7637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value
kou commented on pull request #7639: URL: https://github.com/apache/arrow/pull/7639#issuecomment-654434646 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #7647: ARROW-9337: [R] On C++ library build failure, give an unambiguous message
nealrichardson closed pull request #7647: URL: https://github.com/apache/arrow/pull/7647 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value
kou closed pull request #7639: URL: https://github.com/apache/arrow/pull/7639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7646: ARROW-7237: [C++] Use Result in arrow/json APIs
kou closed pull request #7646: URL: https://github.com/apache/arrow/pull/7646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm opened a new pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled
wesm opened a new pull request #7651: URL: https://github.com/apache/arrow/pull/7651 I discovered a place where clang-format was left disabled by accident relatively early in a file and so doing the reformatting as a separate PR to not have the diff noise in another patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray
wesm closed pull request #7596: URL: https://github.com/apache/arrow/pull/7596 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray
wesm commented on a change in pull request #7596: URL: https://github.com/apache/arrow/pull/7596#discussion_r450548152 ## File path: cpp/src/arrow/array/array_binary.cc ## @@ -24,11 +24,27 @@ #include "arrow/type.h" #include "arrow/util/checked_cast.h" #include "arrow/util/logging.h" +#include "arrow/util/utf8.h" namespace arrow { using internal::checked_cast; +namespace { + +template +Status ValidateStringData(const StringArrayType& array) { + util::InitializeUTF8(); + for (int64_t i = 0; i < array.length(); ++i) { +if (!array.IsNull(i) && !util::ValidateUTF8(array.GetView(i))) { + return Status::Invalid("Invalid UTF8 sequence at string index ", i); +} + } + return Status::OK(); +} Review comment: If this became a perf bottleneck, this would be pretty easy to speed up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled
github-actions[bot] commented on pull request #7651: URL: https://github.com/apache/arrow/pull/7651#issuecomment-654532558 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm merged pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled
wesm merged pull request #7651: URL: https://github.com/apache/arrow/pull/7651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()
kou opened a new pull request #7652: URL: https://github.com/apache/arrow/pull/7652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] shiro615 commented on pull request #7639: ARROW-4600: [Ruby] Arrow::DictionaryArray#[] returns dictionary value
shiro615 commented on pull request #7639: URL: https://github.com/apache/arrow/pull/7639#issuecomment-654531677 Thank you for working on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7651: [C++][MINOR] Enable clang-format in a place where it was accidentally disabled
wesm commented on pull request #7651: URL: https://github.com/apache/arrow/pull/7651#issuecomment-654532749 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7652: ARROW-9341: [GLib] Use arrow::Datum version Take()
github-actions[bot] commented on pull request #7652: URL: https://github.com/apache/arrow/pull/7652#issuecomment-654557935 https://issues.apache.org/jira/browse/ARROW-9341 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder
github-actions[bot] commented on pull request #7649: URL: https://github.com/apache/arrow/pull/7649#issuecomment-654476061 https://issues.apache.org/jira/browse/ARROW-9336 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions
wesm commented on pull request #7632: URL: https://github.com/apache/arrow/pull/7632#issuecomment-654491519 Any opinions about these new APIs? @xhochy or @pitrou could you take a quick look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #7650: ARROW-9340: [R] Use CRAN version of decor package
nealrichardson opened a new pull request #7650: URL: https://github.com/apache/arrow/pull/7650 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7650: ARROW-9340: [R] Use CRAN version of decor package
github-actions[bot] commented on pull request #7650: URL: https://github.com/apache/arrow/pull/7650#issuecomment-654501588 https://issues.apache.org/jira/browse/ARROW-9340 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7649: ARROW-9336: [Ruby] Add support for missing keys in StructArrayBuilder
kou opened a new pull request #7649: URL: https://github.com/apache/arrow/pull/7649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
wesm commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-654492131 +1. Thanks @kszucs for this work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings
wesm closed pull request #7519: URL: https://github.com/apache/arrow/pull/7519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion
nealrichardson commented on pull request #7623: URL: https://github.com/apache/arrow/pull/7623#issuecomment-654497318 @jorisvandenbossche #7519 has merged so I just rebased this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices
mrkn commented on pull request #7044: URL: https://github.com/apache/arrow/pull/7044#issuecomment-654582143 >> If we go for a new type - could I propose a name SparseCOOMatrix (as opposed to n-dimensional SparseCOOTensor). It could perhaps be shortened to COOM? > > The implementation in this pull-request can handle more than 2-dimension. Oh, I forgot to change the title of this pull-request. Sorry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on pull request #7044: ARROW-6485: [Format][C++] Support the format of a COO sparse matrix that has separated row and column indices
mrkn commented on pull request #7044: URL: https://github.com/apache/arrow/pull/7044#issuecomment-654581941 > Hey @mrkn - sorry I didn't have capacity to reply for a while. No problem. Thank you for your cooperation! > If I remember correctly only SciPy has this architecture of COO index having two vectors instead of one 2D matrix. Not only scipy, but also [SuiteSparse](https://github.com/DrTimothyAldenDavis/SuiteSparse) employs the split format. > Question: could we handle this as a special case of COO tensor rather than a new type? Could we serialize `(row, col)` data as a single row major tensor of COO type and only deserialize it into the SciPy layout if desired? (I'm asking because I'm not sure if such approach is feasible) > Although we can handle the split-format as an internal variation of SparseCOOIndex, we still need to introduce the new flatbuffer type. > If we go for a new type - could I propose a name SparseCOOMatrix (as opposed to n-dimensional SparseCOOTensor). It could perhaps be shortened to `COOM`? The implementation in this pull-request can handle more than 2-dimension. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org