[GitHub] [arrow] emkornfield commented on a change in pull request #7965: ARROW-9737: [C++][Gandiva] Add bitwise_xor() for integers
emkornfield commented on a change in pull request #7965: URL: https://github.com/apache/arrow/pull/7965#discussion_r478145554 ## File path: cpp/src/gandiva/precompiled/arithmetic_ops_test.cc ## @@ -120,6 +120,14 @@ TEST(TestArithmeticOps, TestBitwiseOps) { 0xFF7E); EXPECT_EQ(bitwise_or_int64_int64(0x6A5B1, 0x0), 0x6A5B1); + // bitwise XOR + EXPECT_EQ(bitwise_xor_int32_int32(0x0147D, 0x17159), 0x16524); + EXPECT_EQ(bitwise_xor_int32_int32(0xFFCC, 0x0297), 0XFD5B); + EXPECT_EQ(bitwise_xor_int32_int32(0x000, 0x285), 0x285); + EXPECT_EQ(bitwise_xor_int64_int64(0x563672F83, 0x0D9FCF85B), 0x5BA9BD7D8); + EXPECT_EQ(bitwise_xor_int64_int64(0xFFDA8F6A, 0x791C), 0X25F676); + EXPECT_EQ(bitwise_xor_int64_int64(0x6A5B1, 0x0), 0x6A5B1); + Review comment: could you add a test to show xor a value with itself is zero? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mrkn commented on a change in pull request #7898: ARROW-9642: [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices
mrkn commented on a change in pull request #7898: URL: https://github.com/apache/arrow/pull/7898#discussion_r478143508 ## File path: cpp/src/arrow/array/array_dict_test.cc ## @@ -904,6 +904,67 @@ TEST(TestDecimalDictionaryBuilder, DoubleTableSize) { ASSERT_TRUE(expected.Equals(result)); } +TEST(TestNullDictionaryBuilder, Basic) { + // MakeBuilder + auto dict_type = dictionary(int8(), null()); + std::unique_ptr boxed_builder; + ASSERT_OK(MakeBuilder(default_memory_pool(), dict_type, _builder)); + auto& builder = checked_cast&>(*boxed_builder); + + ASSERT_OK(builder.AppendNull()); + ASSERT_OK(builder.AppendNull()); + ASSERT_OK(builder.AppendNull()); + ASSERT_EQ(3, builder.length()); + ASSERT_EQ(3, builder.null_count()); + + ASSERT_OK(builder.AppendNulls(4)); + ASSERT_EQ(7, builder.length()); + ASSERT_EQ(7, builder.null_count()); + + auto int_array = ArrayFromJSON(int8(), "[0, 1, 0, null]"); + ASSERT_OK(builder.AppendArray(*int_array)); Review comment: @pitrou I tried to directly insert `DCHECK` in `AppendArray`, but I found it is difficult because `AppendArray` is defined in `builder_dict.h`. That is a public header file` so we cannot use `DCHECK` there. Instead of directly inserting `DCHECK`, the candidate patch is below. ```diff diff --git a/cpp/src/arrow/array/builder_dict.cc b/cpp/src/arrow/array/builder_dict.cc index 54fd94856..dc239f268 100644 --- a/cpp/src/arrow/array/builder_dict.cc +++ b/cpp/src/arrow/array/builder_dict.cc @@ -189,5 +189,11 @@ Status DictionaryMemoTable::InsertValues(const Array& array) { int32_t DictionaryMemoTable::size() const { return impl_->size(); } +#ifndef NDEBUG +void CheckArrayType(const Type::type type_id, const Array& array, const std::string& message) { + DCHECK_EQ(type_id, array.type_id()) << message; +} +#endif + } // namespace internal } // namespace arrow diff --git a/cpp/src/arrow/array/builder_dict.h b/cpp/src/arrow/array/builder_dict.h index f8c77a5b0..a0d02fd1b 100644 --- a/cpp/src/arrow/array/builder_dict.h +++ b/cpp/src/arrow/array/builder_dict.h @@ -93,6 +93,10 @@ class ARROW_EXPORT DictionaryMemoTable { std::unique_ptr impl_; }; +#ifndef NDEBUG +void CheckArrayType(const Type::type type_id, const Array& array, const std::string& message); +#endif + /// \brief Array builder for created encoded DictionaryArray from /// dense array /// @@ -248,6 +252,10 @@ class DictionaryBuilderBase : public ArrayBuilder { const Array& array) { using ArrayType = typename TypeTraits::ArrayType; +#ifndef NDEBUG +CheckArrayType(T::type_id, array, "Wrong value type of array to be appended"); +#endif + const auto& concrete_array = static_cast(array); for (int64_t i = 0; i < array.length(); i++) { if (array.IsNull(i)) { @@ -406,6 +414,9 @@ class DictionaryBuilderBase : public ArrayBuilder { /// \brief Append a whole dense array to the builder Status AppendArray(const Array& array) { +#ifndef NDEBUG +CheckArrayType(NullType::type_id, array, "Wrong value type of array to be appended"); +#endif for (int64_t i = 0; i < array.length(); i++) { ARROW_RETURN_NOT_OK(AppendNull()); } }; } // namespace internal ``` I don't think this approach is good. Can I follow this way? By the way, I found that `AppendArray` for `DictionaryBuilderBase` of `FixedSizedBinaryType` checks the array type at the beginning of the function and return `Status::Invalid` when mismatching type types. I guess if this kind of check is accepted for `FixedSizedBinaryType`, the same kind of check can be put in the other `AppendArray` functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7883: ARROW-9620: [C++] Added split function to Gandiva
emkornfield commented on pull request #7883: URL: https://github.com/apache/arrow/pull/7883#issuecomment-681545786 +1 merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7973: ARROW-8493: [C++][Parquet] Start populating repeated ancestor defintion
emkornfield commented on a change in pull request #7973: URL: https://github.com/apache/arrow/pull/7973#discussion_r478100513 ## File path: cpp/src/parquet/arrow/arrow_schema_test.cc ## @@ -1140,5 +1144,244 @@ TEST(TestFromParquetSchema, CorruptMetadata) { ASSERT_RAISES(IOError, FromParquetSchema(parquet_schema, props, _schema)); } +::arrow::Result> RootToTreeLeafLevels( +const SchemaManifest& manifest, int column_number) { + std::deque out; + const SchemaField* field; + RETURN_NOT_OK(manifest.GetColumnField(column_number, )); + while (field != nullptr) { +out.push_front(field->level_info); +field = manifest.GetParent(field); + } + return out; +} + +class TestLevels : public ::testing::Test { + public: + virtual void SetUp() {} + + ::arrow::Status MaybeSetParquetSchema(const NodePtr& column) { +descriptor_.reset(new SchemaDescriptor()); +manifest_.reset(new SchemaManifest()); +descriptor_->Init(GroupNode::Make("root", Repetition::REQUIRED, {column})); +return SchemaManifest::Make(descriptor_.get(), +std::shared_ptr(), +ArrowReaderProperties(), manifest_.get()); + } + void SetParquetSchema(const NodePtr& column) { +ASSERT_OK(MaybeSetParquetSchema(column)); + } + + protected: + std::unique_ptr descriptor_; + std::unique_ptr manifest_; +}; + +TEST_F(TestLevels, TestPrimitive) { + SetParquetSchema( + PrimitiveNode::Make("node_name", Repetition::REQUIRED, ParquetType::BOOLEAN)); + ASSERT_OK_AND_ASSIGN(std::deque levels, + RootToTreeLeafLevels(*manifest_, /*column_number=*/0)); + EXPECT_THAT(levels, ElementsAre(LevelInfo{/*null_slot_usage=*/1, +/*def_level=*/0, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0})); + SetParquetSchema( + PrimitiveNode::Make("node_name", Repetition::OPTIONAL, ParquetType::BOOLEAN)); + ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, /*column_number=*/0)); + EXPECT_THAT(levels, ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, +/*rep_level=*/0, +/*ancestor_list_def_level*/ 0})); + + // Arrow schema: list(bool not null) not null + SetParquetSchema( + PrimitiveNode::Make("node_name", Repetition::REPEATED, ParquetType::BOOLEAN)); + ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, /*column_number=*/0)); + EXPECT_THAT( + levels, + ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, /*rep_level=*/1, +/*ancestor_list_def_level*/ 0}, // List Field + LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, /*rep_level=*/1, +/*ancestor_list_def_level*/ 1})); // primitive field +} + +TEST_F(TestLevels, TestSimpleGroups) { + // Arrow schema: struct(child: struct(inner: boolean not null)) + SetParquetSchema(GroupNode::Make( + "parent", Repetition::OPTIONAL, + {GroupNode::Make( + "child", Repetition::OPTIONAL, + {PrimitiveNode::Make("inner", Repetition::REQUIRED, ParquetType::BOOLEAN)})})); + ASSERT_OK_AND_ASSIGN(std::deque levels, + RootToTreeLeafLevels(*manifest_, /*column_number=*/0)); + EXPECT_THAT( + levels, + ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0}, + LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0}, + LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0})); + + // Arrow schema: struct(child: struct(inner: boolean )) + SetParquetSchema(GroupNode::Make( + "parent", Repetition::OPTIONAL, + {GroupNode::Make( + "child", Repetition::OPTIONAL, + {PrimitiveNode::Make("inner", Repetition::OPTIONAL, ParquetType::BOOLEAN)})})); + ASSERT_OK_AND_ASSIGN(levels, RootToTreeLeafLevels(*manifest_, /*column_number=*/0)); + EXPECT_THAT( + levels, + ElementsAre(LevelInfo{/*null_slot_usage=*/1, /*def_level=*/1, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0}, + LevelInfo{/*null_slot_usage=*/1, /*def_level=*/2, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0}, + LevelInfo{/*null_slot_usage=*/1, /*def_level=*/3, /*rep_level=*/0, +/*ancestor_list_def_level*/ 0})); + + // Arrow schema: struct(child: struct(inner: boolean)) not null + SetParquetSchema(GroupNode::Make( + "parent", Repetition::REQUIRED, + {GroupNode::Make( + "child", Repetition::OPTIONAL, + {PrimitiveNode::Make("inner",
[GitHub] [arrow] emkornfield commented on pull request #7887: ARROW-9304: [C++] Add "AppendEmpty" builder APIs for use inside StructBuilder::AppendNull
emkornfield commented on pull request #7887: URL: https://github.com/apache/arrow/pull/7887#issuecomment-681356508 @pitrou does this seem ok now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support
emkornfield commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-681347717 @rj-atw where you able to figure out CI? It looks like this also needs to be rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7287: ARROW-8771: [C++] Add boost/process library to build support
emkornfield commented on pull request #7287: URL: https://github.com/apache/arrow/pull/7287#issuecomment-681346310 Is there a JIRA open to upgrade to thrift 0.13.0? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader
emkornfield commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-681343742 @wjones1 did you have a chance to figure out CI issues? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #6770: ARROW-7842: [Rust] [Parquet] implement array_reader for list type columns
emkornfield commented on pull request #6770: URL: https://github.com/apache/arrow/pull/6770#issuecomment-681341031 @nevi-me did your other PRs supercede this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #6676: ARROW-8175: [Python] Setup type checking with mypy
emkornfield commented on pull request #6676: URL: https://github.com/apache/arrow/pull/6676#issuecomment-681337073 @jorisvandenbossche @xhochy next steps here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #6506: ARROW-7878: [C++][Compute] Draft LogicalPlan classes
emkornfield commented on pull request #6506: URL: https://github.com/apache/arrow/pull/6506#issuecomment-681335327 @wesm @fsaintjacques @kszucs do you want to keep this PR open or resurrect it at some point in the future? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #7276: ARROW-8947: [Java] MapWithOrdinal javadoc doesn't describe actual behaviour
emkornfield commented on pull request #7276: URL: https://github.com/apache/arrow/pull/7276#issuecomment-681333219 @rymurr wonder if you maybe had a chance to discuss with @jacques-n offline to see if there are any sharp edges here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on issue #8025: ParquetWriter creates bad files when passed pyarrow schema from pyarrow table?
emkornfield commented on issue #8025: URL: https://github.com/apache/arrow/issues/8025#issuecomment-681330368 @losze1cj where you able to test out the suggestions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield closed issue #8054: Arrow Flight from Python to JavaScript
emkornfield closed issue #8054: URL: https://github.com/apache/arrow/issues/8054 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on issue #8054: Arrow Flight from Python to JavaScript
emkornfield commented on issue #8054: URL: https://github.com/apache/arrow/issues/8054#issuecomment-681324547 Hi Alex, The JIRA looks like a reasonable start. Thank you for filing it. I'll close this and tag some people who might be able to give you an answer there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8046: ARROW-9850:[Go] Defer should not be used inside a loop
emkornfield commented on pull request #8046: URL: https://github.com/apache/arrow/pull/8046#issuecomment-681324037 @sbinet do you have time to take look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8059: [MINOR] [java][jdbc] Improve error messages for unsupported types.
emkornfield commented on pull request #8059: URL: https://github.com/apache/arrow/pull/8059#issuecomment-681323737 @mariusvniekerk thank you for the PR generally it looks good but could you add a unit tests to demonstrate cover the cases that generated NPE? Also, would you mind opening JIRA to track this issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield closed pull request #8040: PARQUET-1904: [C++] Export file_offset in RowGroupMetaData
emkornfield closed pull request #8040: URL: https://github.com/apache/arrow/pull/8040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] paddyhoran closed pull request #8051: ARROW-9853: [RUST] Implement take kernel for dictionary arrays
paddyhoran closed pull request #8051: URL: https://github.com/apache/arrow/pull/8051 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData
emkornfield commented on pull request #8040: URL: https://github.com/apache/arrow/pull/8040#issuecomment-681317956 +1 thank you @scober This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8061: ARROW-9723: [C++][Compute] Count NaN in mode kernel
github-actions[bot] commented on pull request #8061: URL: https://github.com/apache/arrow/pull/8061#issuecomment-681316938 https://issues.apache.org/jira/browse/ARROW-9723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 opened a new pull request #8061: ARROW-9723: [C++][Compute] Count NaN in mode kernel
cyb70289 opened a new pull request #8061: URL: https://github.com/apache/arrow/pull/8061 For array [NaN, NaN, 1], output mode = NaN, count = 2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8060: ARROW-9871: [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
github-actions[bot] commented on pull request #8060: URL: https://github.com/apache/arrow/pull/8060#issuecomment-681311560 https://issues.apache.org/jira/browse/ARROW-9871 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind edited a comment on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
jianxind edited a comment on pull request #8049: URL: https://github.com/apache/arrow/pull/8049#issuecomment-681200942 I will create a JIRA for adding upper case to ARROW_USER_SIMD_LEVEL also. PR: https://github.com/apache/arrow/pull/8060 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind opened a new pull request #8060: ARROW-9871: [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
jianxind opened a new pull request #8060: URL: https://github.com/apache/arrow/pull/8060 Follow the convention of ARROW_SIMD_LEVEL. Signed-off-by: Frank Du This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
kou commented on pull request #8047: URL: https://github.com/apache/arrow/pull/8047#issuecomment-681285381 OK. The ML thread: https://lists.apache.org/thread.html/r3d4ab3ccc8db7a7a68baa727bf9b6911930b61609c0faaed14e2b773%40%3Cdev.arrow.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
emkornfield commented on pull request #8047: URL: https://github.com/apache/arrow/pull/8047#issuecomment-681265603 We should probably hold off merging more PRs for big endian architectures until we come to consensus on this on the ML about support in general This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8059: [MINOR] [java][jdbc] Improve error messages for unsupported types.
github-actions[bot] commented on pull request #8059: URL: https://github.com/apache/arrow/pull/8059#issuecomment-681202023 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
jianxind commented on pull request #8049: URL: https://github.com/apache/arrow/pull/8049#issuecomment-681200942 I will create a JIRA for adding upper case to ARROW_USER_SIMD_LEVEL also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
jianxind commented on a change in pull request #7871: URL: https://github.com/apache/arrow/pull/7871#discussion_r477809857 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) { AssertDatumsEqual(explicit_defaults, no_options_provided); } +template +using MinMaxResult = std::pair; + +template +static enable_if_integer> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::max(); + T max = std::numeric_limits::min(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::min(min, values[i]); +max = std::max(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::min(min, values[i]); + max = std::max(max, values[i]); +} + } + + return {min, max}; +} + +template +static enable_if_floating_point> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::infinity(); + T max = -std::numeric_limits::infinity(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::fmin(min, values[i]); +max = std::fmax(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::fmin(min, values[i]); + max = std::fmax(max, values[i]); +} + } + + return {min, max}; +} + +template +void ValidateMinMax(const Array& array) { + using Traits = TypeTraits; + using ScalarType = typename Traits::ScalarType; + + ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array)); + const StructScalar& value = out.scalar_as(); + + auto expected = NaiveMinMax(array); + const auto& out_min = checked_cast(*value.value[0]); + ASSERT_EQ(expected.first, out_min.value); + + const auto& out_max = checked_cast(*value.value[1]); + ASSERT_EQ(expected.second, out_max.value); +} + +template +class TestRandomNumericMinMaxKernel : public ::testing::Test {}; + +TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes); +TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) { + auto rand = random::RandomArrayGenerator(0x8afc055); + // Test size up to 1<<13 (8192). Review comment: The new implementation is based on BitBlockCounter, I am afraid 1024 size with 0.001(only one valid value) null_probability doesn't cover enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
jianxind commented on a change in pull request #7871: URL: https://github.com/apache/arrow/pull/7871#discussion_r477794871 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) { AssertDatumsEqual(explicit_defaults, no_options_provided); } +template +using MinMaxResult = std::pair; + +template +static enable_if_integer> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::max(); + T max = std::numeric_limits::min(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::min(min, values[i]); +max = std::max(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::min(min, values[i]); + max = std::max(max, values[i]); +} + } + + return {min, max}; +} + +template +static enable_if_floating_point> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::infinity(); + T max = -std::numeric_limits::infinity(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::fmin(min, values[i]); +max = std::fmax(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::fmin(min, values[i]); + max = std::fmax(max, values[i]); +} + } + + return {min, max}; +} + +template +void ValidateMinMax(const Array& array) { + using Traits = TypeTraits; + using ScalarType = typename Traits::ScalarType; + + ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array)); + const StructScalar& value = out.scalar_as(); + + auto expected = NaiveMinMax(array); + const auto& out_min = checked_cast(*value.value[0]); + ASSERT_EQ(expected.first, out_min.value); Review comment: Return null for both the expected test and MinMax. if (array.length() <= array.null_count()) { // All null values return {static_cast(0), static_cast(0)}; } This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jianxind commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
jianxind commented on a change in pull request #7871: URL: https://github.com/apache/arrow/pull/7871#discussion_r477781881 ## File path: cpp/src/arrow/compute/api_aggregate.h ## @@ -130,23 +130,6 @@ Result MinMax(const Datum& value, const MinMaxOptions& options = MinMaxOptions::Defaults(), ExecContext* ctx = NULLPTR); -/// \brief Calculate the min / max of a numeric array. -/// -/// This function returns both the min and max as a collection. The resulting -/// datum thus consists of two scalar datums: {Datum(min), Datum(max)} -/// -/// \param[in] array input array -/// \param[in] options see MinMaxOptions for more information -/// \param[in] ctx the function execution context, optional -/// \return resulting datum containing a {min, max} collection -/// -/// \since 1.0.0 -/// \note API not yet finalized -ARROW_EXPORT -Result MinMax(const Array& array, Review comment: Yes. There's no implementation to Array input of MinMax, it will hit a link error if call MinMax with Array data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
jduo commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477768470 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/ServerInterceptorAdapter.java ## @@ -87,10 +87,12 @@ public ServerInterceptorAdapter(List> factories) { // Use LinkedHashMap to preserve insertion order final Map, FlightServerMiddleware> middlewareMap = new LinkedHashMap<>(); final MetadataAdapter headerAdapter = new MetadataAdapter(headers); +Context currentContext = Context.current(); for (final KeyFactory factory : factories) { final FlightServerMiddleware m; try { m = factory.factory.onCallStarted(info, headerAdapter); +currentContext = m.onAuthenticationSuccess(currentContext); Review comment: I added another input to onCallStarted() that the implementor can use to write context-specific variable values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
jduo commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477766787 ## File path: java/flight/flight-core/src/test/java/org/apache/arrow/flight/auth/TestBasicAuth.java ## @@ -44,29 +44,38 @@ import org.junit.Ignore; import org.junit.Test; +import com.google.common.base.Strings; import com.google.common.collect.ImmutableList; public class TestBasicAuth { private static final String USERNAME = "flight"; private static final String PASSWORD = "woohoo"; - private static final byte[] VALID_TOKEN = "my_token".getBytes(StandardCharsets.UTF_8); + private static final String VALID_TOKEN = "my_token"; + private FlightClient.Builder clientBuilder; private FlightClient client; private FlightServer server; private BufferAllocator allocator; @Test public void validAuth() { -client.authenticateBasic(USERNAME, PASSWORD); - Assert.assertTrue(ImmutableList.copyOf(client.listFlights(Criteria.ALL)).size() == 0); +try { + client = clientBuilder.callCredentials(new BasicAuthCallCredentials(USERNAME, PASSWORD)).build(); + client.handshake(); + Assert.assertTrue(ImmutableList.copyOf(client.listFlights(Criteria.ALL)).size() == 0); Review comment: Changed to assertTrue(isEmpty()). But let's focus on the design given this is a POC. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
kou commented on pull request #8047: URL: https://github.com/apache/arrow/pull/8047#issuecomment-681174260 This PR has been merged into master: https://github.com/apache/arrow/commit/92e01cc5ebe912f0ad1377eae765aa36478db52f We use our merge tool https://github.com/apache/arrow/blob/master/dev/merge_arrow_pr.py instead of using GitHub's merge UI to update corresponding JIRA issue too. We close PR but the change is merged. Now, you can use the CI job in https://github.com/apache/arrow/pull/8011 by rebasing on master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8058: ARROW-9854: [R] Support reading/writing data to/from S3
github-actions[bot] commented on pull request #8058: URL: https://github.com/apache/arrow/pull/8058#issuecomment-681140852 https://issues.apache.org/jira/browse/ARROW-9854 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #8058: ARROW-9854: [R] Support reading/writing data to/from S3
nealrichardson opened a new pull request #8058: URL: https://github.com/apache/arrow/pull/8058 - [x] read_parquet/feather/etc. from S3 (use FileSystem->OpenInputFile(path)) - [x] write_$FORMAT via FileSystem->OpenOutputStream(path) - [x] write_dataset (done? at least via URI) - [ ] for linux, an argument to install_arrow to help, assuming you've installed aws-sdk-cpp already (turn on ARROW_S3, AWSSDK_SOURCE=SYSTEM) - [ ] testing with minio on CI - [x] set up a real test bucket and user for e2e testing (credentials available on request) - [ ] update docs, vignettes, news Out of the current scope: - [ ] download dataset, i.e. copy files/directory recursively (needs ARROW-9867, ARROW-9868) - [ ] friendlier methods for interacting with/viewing a filesystem (ls, mkdir, etc.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] lidavidm commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
lidavidm commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477591584 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/CallOptions.java ## @@ -56,7 +56,7 @@ public static CallOption timeout(long duration, TimeUnit unit) { /** * CallOptions specific to GRPC stubs. */ - interface GrpcCallOption extends CallOption { + public interface GrpcCallOption extends CallOption { Review comment: This can't be public in this module - it should have a public subinterface in flight-grpc. ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/CallContext.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.flight; + +/** + * Tracks variables about the current request. + */ +public interface CallContext { Review comment: Let's try not to rename existing classes unnecessarily and instead come up with new names. ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightRuntimeException.java ## @@ -29,7 +29,7 @@ /** * Create a new exception from the given status. */ - FlightRuntimeException(CallStatus status) { + public FlightRuntimeException(CallStatus status) { Review comment: This shouldn't need to be public, a CallStatus has a toRuntimeException method already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] vivkong commented on pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
vivkong commented on pull request #8047: URL: https://github.com/apache/arrow/pull/8047#issuecomment-681105501 @kou Sorry I'm not familiar with the process, will this PR be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
kou closed pull request #8047: URL: https://github.com/apache/arrow/pull/8047 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
kou commented on pull request #8049: URL: https://github.com/apache/arrow/pull/8049#issuecomment-681096064 I agree with you including upper case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
jduo commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477543928 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightClient.java ## @@ -84,18 +81,15 @@ * Create a Flight client from an allocator and a gRPC channel. */ FlightClient(BufferAllocator incomingAllocator, ManagedChannel channel, - List middleware) { + List middleware, CallCredentials callCredentials) { this.allocator = incomingAllocator.newChildAllocator("flight-client", 0, Long.MAX_VALUE); this.channel = channel; -final ClientInterceptor[] interceptors; -interceptors = new ClientInterceptor[]{authInterceptor, new ClientInterceptorAdapter(middleware)}; - // Create a channel with interceptors pre-applied for DoGet and DoPut -this.interceptedChannel = ClientInterceptors.intercept(channel, interceptors); +this.interceptedChannel = ClientInterceptors.intercept(channel, new ClientInterceptorAdapter(middleware)); -blockingStub = FlightServiceGrpc.newBlockingStub(interceptedChannel); -asyncStub = FlightServiceGrpc.newStub(interceptedChannel); +blockingStub = FlightServiceGrpc.newBlockingStub(interceptedChannel).withCallCredentials(callCredentials); Review comment: I've added a CredentialCallOptions class which creates and adds CallCredentials to the gRPC stub. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
jduo commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477543461 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/FlightClient.java ## @@ -156,23 +150,12 @@ } /** - * Authenticates with a username and password. Review comment: I renamed authenticate() to handshake() now, because authentication is now supplied as a CallOption. I brought back authenticateBasic() and also now have it return a CallOption containing the bearer token credentials if the server sent it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jduo commented on a change in pull request #7994: ARROW-9804: [FlightRPC] Flight auth redesign
jduo commented on a change in pull request #7994: URL: https://github.com/apache/arrow/pull/7994#discussion_r477542825 ## File path: java/flight/flight-core/src/main/java/org/apache/arrow/flight/grpc/ServerInterceptorAdapter.java ## @@ -87,10 +87,12 @@ public ServerInterceptorAdapter(List> factories) { // Use LinkedHashMap to preserve insertion order final Map, FlightServerMiddleware> middlewareMap = new LinkedHashMap<>(); final MetadataAdapter headerAdapter = new MetadataAdapter(headers); +Context currentContext = Context.current(); for (final KeyFactory factory : factories) { final FlightServerMiddleware m; try { m = factory.factory.onCallStarted(info, headerAdapter); +currentContext = m.onAuthenticationSuccess(currentContext); Review comment: I created a facade for this and supplied it as an input to onCallStarted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scober commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData
scober commented on a change in pull request #8040: URL: https://github.com/apache/arrow/pull/8040#discussion_r477475598 ## File path: cpp/src/parquet/metadata.h ## @@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData { /// \brief Total byte size of all the uncompressed column data in this row group. int64_t total_byte_size() const; + + // The file_offset field that this method exposes is optional. This method Review comment: Sure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData
emkornfield commented on a change in pull request #8040: URL: https://github.com/apache/arrow/pull/8040#discussion_r477472781 ## File path: cpp/src/parquet/metadata.h ## @@ -210,6 +210,12 @@ class PARQUET_EXPORT RowGroupMetaData { /// \brief Total byte size of all the uncompressed column data in this row group. int64_t total_byte_size() const; + + // The file_offset field that this method exposes is optional. This method Review comment: Please move this below \brief with a new line and use triple slash '///' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] scober commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData
scober commented on pull request #8040: URL: https://github.com/apache/arrow/pull/8040#issuecomment-681018008 I updated the comments for the method declaration to address your documentation concerns. I did not change the default value of `file_offset` because I agree that 0 is wrong (though less obviously wrong than -1) and I did not want to chase down the unintended consequences of changing that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on a change in pull request #8052: URL: https://github.com/apache/arrow/pull/8052#discussion_r477293832 ## File path: cpp/src/arrow/c/abi.h ## @@ -60,6 +60,31 @@ struct ArrowArray { void* private_data; }; +// EXPERIMENTAL +struct ArrowArrayStream { + // Callback to get the stream type + // (will be the same for all arrays in the stream). + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out); + // Callback to get the next array + // (if no error and the array is released, the stream has ended) + // Return value: 0 if successful, an `errno`-compatible error code otherwise. Review comment: I don't know how to phrase it: it returns value that are `errno` error codes (in case of error). A number of values are standard in C++: https://en.cppreference.com/w/cpp/error/errno_macros This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] emkornfield commented on pull request #8040: ARROW-9824: [C++] Export file_offset in RowGroupMetaData
emkornfield commented on pull request #8040: URL: https://github.com/apache/arrow/pull/8040#issuecomment-681004709 > When you say documentation I assume you mean a comment near the method definition? Yes the comment for the method definition > And it looks the default value of this field is 0? Yes, I'm not sure if it breaks anything to make this -1 or something obviously wrong (0 should be wrong because I think all parquet files need to start with 'PAR1' > And when you say the wording from the parquet specification do you mean the comment in the thrift definition? Yes: ``` Byte offset from beginning of file to first page (data or dictionary) in this row group ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7871: ARROW-9605: [C++] Speed up aggregate min/max compute kernels on integer types
pitrou commented on a change in pull request #7871: URL: https://github.com/apache/arrow/pull/7871#discussion_r477425988 ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) { AssertDatumsEqual(explicit_defaults, no_options_provided); } +template +using MinMaxResult = std::pair; + +template +static enable_if_integer> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::max(); + T max = std::numeric_limits::min(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::min(min, values[i]); +max = std::max(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::min(min, values[i]); + max = std::max(max, values[i]); +} + } + + return {min, max}; +} + +template +static enable_if_floating_point> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::infinity(); + T max = -std::numeric_limits::infinity(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::fmin(min, values[i]); +max = std::fmax(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::fmin(min, values[i]); + max = std::fmax(max, values[i]); +} + } + + return {min, max}; +} + +template +void ValidateMinMax(const Array& array) { + using Traits = TypeTraits; + using ScalarType = typename Traits::ScalarType; + + ASSERT_OK_AND_ASSIGN(Datum out, MinMax(array)); + const StructScalar& value = out.scalar_as(); + + auto expected = NaiveMinMax(array); + const auto& out_min = checked_cast(*value.value[0]); + ASSERT_EQ(expected.first, out_min.value); + + const auto& out_max = checked_cast(*value.value[1]); + ASSERT_EQ(expected.second, out_max.value); +} + +template +class TestRandomNumericMinMaxKernel : public ::testing::Test {}; + +TYPED_TEST_SUITE(TestRandomNumericMinMaxKernel, NumericArrowTypes); +TYPED_TEST(TestRandomNumericMinMaxKernel, RandomArrayMinMax) { + auto rand = random::RandomArrayGenerator(0x8afc055); + // Test size up to 1<<13 (8192). Review comment: Sounds a bit large. Why not stop at e.g. 1024? ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -594,6 +594,113 @@ TYPED_TEST(TestFloatingMinMaxKernel, DefaultOptions) { AssertDatumsEqual(explicit_defaults, no_options_provided); } +template +using MinMaxResult = std::pair; + +template +static enable_if_integer> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric = reinterpret_cast(array); + const auto values = array_numeric.raw_values(); + + if (array.length() <= array.null_count()) { // All null values +return {static_cast(0), static_cast(0)}; + } + + T min = std::numeric_limits::max(); + T max = std::numeric_limits::min(); + if (array.null_count() != 0) { // Some values are null +internal::BitmapReader reader(array.null_bitmap_data(), array.offset(), + array.length()); +for (int64_t i = 0; i < array.length(); i++) { + if (reader.IsSet()) { +min = std::min(min, values[i]); +max = std::max(max, values[i]); + } + reader.Next(); +} + } else { // All true values +for (int64_t i = 0; i < array.length(); i++) { + min = std::min(min, values[i]); + max = std::max(max, values[i]); +} + } + + return {min, max}; +} + +template +static enable_if_floating_point> NaiveMinMax( +const Array& array) { + using T = typename ArrowType::c_type; + using ArrayType = typename TypeTraits::ArrayType; + + const auto& array_numeric =
[GitHub] [arrow] pitrou commented on a change in pull request #7973: ARROW-8493: [C++][Parquet] Start populating repeated ancestor defintion
pitrou commented on a change in pull request #7973: URL: https://github.com/apache/arrow/pull/7973#discussion_r477352114 ## File path: cpp/src/parquet/level_conversion.h ## @@ -20,10 +20,117 @@ #include #include "parquet/platform.h" +#include "parquet/schema.h" namespace parquet { namespace internal { +struct PARQUET_EXPORT LevelInfo { + LevelInfo() + : null_slot_usage(1), def_level(0), rep_level(0), repeated_ancestor_def_level(0) {} + LevelInfo(int32_t null_slots, int32_t definition_level, int32_t repitition_level, +int32_t repeated_ancestor_definition_level) + : null_slot_usage(null_slots), +def_level(definition_level), +rep_level(repitition_level), +repeated_ancestor_def_level(repeated_ancestor_definition_level) {} + + bool operator==(const LevelInfo& b) const { +return null_slot_usage == b.null_slot_usage && def_level == b.def_level && + rep_level == b.rep_level && + repeated_ancestor_def_level == b.repeated_ancestor_def_level; + } + + // How many slots a null element consumes. + // This is only ever >1 for descendents of + // FixedSizeList. + int32_t null_slot_usage = 1; + + // The definition level at which the value for the field + // is considered not null (definition levels greater than + // or equal to indicate this value indicate a not-null + // value for the field). For list fields definition levels + // greater then or equal to this field indicate a present + // , possibly null, element. + int16_t def_level = 0; + + // The repetition level corresponding to this element + // or the closest repeated ancestor. Any repetition + // level less than this indicates either a new list OR + // an empty list (which is determined in conjunction + // definition_level). + int16_t rep_level = 0; + + // The definition level indicating the level at which the closest + // repeated ancestor was not empty. This is used to discriminate + // between a value less than |definition_level| + // being null or excluded entirely. + // For instance if we have an arrow schema like: + // list(struct(f0: int)). Then then there are the following + // definition levels: + // 0 = null list + // 1 = present but empty list. + // 2 = a null value in the list + // 3 = a non null struct but null integer. + // 4 = a present integer. Review comment: +1, thanks for this example! ## File path: cpp/src/parquet/level_conversion.h ## @@ -20,10 +20,117 @@ #include #include "parquet/platform.h" +#include "parquet/schema.h" namespace parquet { namespace internal { +struct PARQUET_EXPORT LevelInfo { + LevelInfo() + : null_slot_usage(1), def_level(0), rep_level(0), repeated_ancestor_def_level(0) {} + LevelInfo(int32_t null_slots, int32_t definition_level, int32_t repitition_level, +int32_t repeated_ancestor_definition_level) + : null_slot_usage(null_slots), +def_level(definition_level), +rep_level(repitition_level), +repeated_ancestor_def_level(repeated_ancestor_definition_level) {} + + bool operator==(const LevelInfo& b) const { +return null_slot_usage == b.null_slot_usage && def_level == b.def_level && + rep_level == b.rep_level && + repeated_ancestor_def_level == b.repeated_ancestor_def_level; + } + + // How many slots a null element consumes. + // This is only ever >1 for descendents of + // FixedSizeList. + int32_t null_slot_usage = 1; + + // The definition level at which the value for the field + // is considered not null (definition levels greater than + // or equal to indicate this value indicate a not-null + // value for the field). For list fields definition levels + // greater then or equal to this field indicate a present + // , possibly null, element. + int16_t def_level = 0; + + // The repetition level corresponding to this element + // or the closest repeated ancestor. Any repetition + // level less than this indicates either a new list OR + // an empty list (which is determined in conjunction + // definition_level). + int16_t rep_level = 0; + + // The definition level indicating the level at which the closest + // repeated ancestor was not empty. This is used to discriminate Review comment: Do you mean logical ancestor (in Arrow terms)? Or physical ancestor (in Parquet nesting)? ## File path: cpp/src/parquet/level_conversion.h ## @@ -20,10 +20,117 @@ #include #include "parquet/platform.h" +#include "parquet/schema.h" namespace parquet { namespace internal { +struct PARQUET_EXPORT LevelInfo { + LevelInfo() + : null_slot_usage(1), def_level(0), rep_level(0), repeated_ancestor_def_level(0) {} + LevelInfo(int32_t null_slots, int32_t definition_level, int32_t repitition_level, +int32_t repeated_ancestor_definition_level) + : null_slot_usage(null_slots), +def_level(definition_level), +
[GitHub] [arrow] jorisvandenbossche commented on pull request #8055: ARROW-7226: [Python][Doc] Add note re: JSON format support
jorisvandenbossche commented on pull request #8055: URL: https://github.com/apache/arrow/pull/8055#issuecomment-680919060 > Why would you expect something else? Because this is not a default "JSON" file, and many people will expect a "json reader" to support that. I know it is already mentioned in the text, but given this expectation, I think it is fine to have an additional explicit note about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wqc200 closed pull request #8033: ARROW-9837: [Rust][DataFusion] Add provider for variable
wqc200 closed pull request #8033: URL: https://github.com/apache/arrow/pull/8033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
pitrou commented on a change in pull request #7992: URL: https://github.com/apache/arrow/pull/7992#discussion_r477341587 ## File path: cpp/src/arrow/ipc/metadata_internal.h ## @@ -48,6 +48,7 @@ namespace flatbuf = org::apache::arrow::flatbuf; namespace ipc { +class DictionaryFieldMapper; class DictionaryMemo; Review comment: `DictionaryMemo` is still passed in some IPC reading APIs, I think. I don't know if those are meant to be public. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
pitrou commented on a change in pull request #7992: URL: https://github.com/apache/arrow/pull/7992#discussion_r477340732 ## File path: cpp/src/arrow/ipc/dictionary.h ## @@ -21,96 +21,145 @@ #include #include -#include #include #include -#include "arrow/memory_pool.h" +#include "arrow/result.h" #include "arrow/status.h" +#include "arrow/type_fwd.h" #include "arrow/util/macros.h" #include "arrow/util/visibility.h" namespace arrow { +namespace ipc { + +class FieldPosition { + public: + FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {} + + FieldPosition child(int index) const { return {this, index}; } + + std::vector path() const { +std::vector path(depth_); +const FieldPosition* cur = this; +for (int i = depth_ - 1; i >= 0; --i) { + path[i] = cur->index_; + cur = cur->parent_; +} +return path; + } + + protected: + FieldPosition(const FieldPosition* parent, int index) + : parent_(parent), index_(index), depth_(parent->depth_ + 1) {} + + const FieldPosition* parent_; + int index_; + int depth_; +}; -class Array; -class DataType; -class Field; -class RecordBatch; +/// \brief Map fields in a schema to dictionary ids +/// +/// The mapping is structural, i.e. the field path (as a vector of indices) +/// is associated to the dictionary id. Review comment: Will add. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
pitrou commented on a change in pull request #7992: URL: https://github.com/apache/arrow/pull/7992#discussion_r477341153 ## File path: cpp/src/arrow/ipc/dictionary.h ## @@ -21,96 +21,145 @@ #include #include -#include #include #include -#include "arrow/memory_pool.h" +#include "arrow/result.h" #include "arrow/status.h" +#include "arrow/type_fwd.h" #include "arrow/util/macros.h" #include "arrow/util/visibility.h" namespace arrow { +namespace ipc { + +class FieldPosition { + public: + FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {} + + FieldPosition child(int index) const { return {this, index}; } + + std::vector path() const { +std::vector path(depth_); +const FieldPosition* cur = this; +for (int i = depth_ - 1; i >= 0; --i) { + path[i] = cur->index_; + cur = cur->parent_; +} +return path; + } + + protected: + FieldPosition(const FieldPosition* parent, int index) + : parent_(parent), index_(index), depth_(parent->depth_ + 1) {} + + const FieldPosition* parent_; + int index_; + int depth_; +}; -class Array; -class DataType; -class Field; -class RecordBatch; +/// \brief Map fields in a schema to dictionary ids +/// +/// The mapping is structural, i.e. the field path (as a vector of indices) +/// is associated to the dictionary id. +class ARROW_EXPORT DictionaryFieldMapper { + public: + DictionaryFieldMapper(); + explicit DictionaryFieldMapper(const Schema& schema); + ~DictionaryFieldMapper(); -namespace ipc { + Status AddSchemaFields(const Schema& schema); + Status AddField(int64_t id, std::vector field_path); -/// \brief Memoization data structure for assigning id numbers to -/// dictionaries and tracking their current state through possible -/// deltas in an IPC stream + Result GetFieldId(std::vector field_path) const; + + int num_fields() const; + + private: + struct Impl; + std::unique_ptr impl_; +}; + +using DictionaryVector = std::vector>>; + +/// \brief Memoization data structure for reading dictionaries from IPC streams +/// +/// This structure tracks the following associations: +/// - field position (structural) -> dictionary id +/// - dictionary id -> value type +/// - dictionary id -> dictionary (value) data +/// +/// Together, they allow resolving dictionary data when reading an IPC stream, +/// using metadata recorded in the schema message and data recorded in the +/// dictionary batch messages (see ResolveDictionaries). +/// +/// This structure isn't useful for writing an IPC stream, where only +/// DictionaryFieldMapper is necessary. class ARROW_EXPORT DictionaryMemo { public: - using DictionaryVector = std::vector>>; - DictionaryMemo(); - DictionaryMemo(DictionaryMemo&&) = default; - DictionaryMemo& operator=(DictionaryMemo&&) = default; + ~DictionaryMemo(); + + DictionaryFieldMapper& fields(); Review comment: Well, either this, or we duplicate the `DictionaryFieldMapper` APIs here. I have no particular preference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
pitrou commented on a change in pull request #7992: URL: https://github.com/apache/arrow/pull/7992#discussion_r477340451 ## File path: cpp/src/arrow/flight/internal.cc ## @@ -465,11 +465,9 @@ Status FromProto(const pb::SchemaResult& pb_result, std::string* result) { } Status SchemaToString(const Schema& schema, std::string* out) { - // TODO(wesm): Do we care about better memory efficiency here? ipc::DictionaryMemo unused_dict_memo; Review comment: Hmm... since there seemed to be nothing concerning about memory efficiency, I thought this TODO was obsolete. What did you have in mind when writing it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
wesm commented on a change in pull request #7992: URL: https://github.com/apache/arrow/pull/7992#discussion_r474285549 ## File path: cpp/src/arrow/flight/internal.cc ## @@ -465,11 +465,9 @@ Status FromProto(const pb::SchemaResult& pb_result, std::string* result) { } Status SchemaToString(const Schema& schema, std::string* out) { - // TODO(wesm): Do we care about better memory efficiency here? ipc::DictionaryMemo unused_dict_memo; Review comment: This can be removed now? ## File path: cpp/src/arrow/ipc/dictionary.h ## @@ -21,96 +21,145 @@ #include #include -#include #include #include -#include "arrow/memory_pool.h" +#include "arrow/result.h" #include "arrow/status.h" +#include "arrow/type_fwd.h" #include "arrow/util/macros.h" #include "arrow/util/visibility.h" namespace arrow { +namespace ipc { + +class FieldPosition { + public: + FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {} + + FieldPosition child(int index) const { return {this, index}; } + + std::vector path() const { +std::vector path(depth_); +const FieldPosition* cur = this; +for (int i = depth_ - 1; i >= 0; --i) { + path[i] = cur->index_; + cur = cur->parent_; +} +return path; + } + + protected: + FieldPosition(const FieldPosition* parent, int index) + : parent_(parent), index_(index), depth_(parent->depth_ + 1) {} + + const FieldPosition* parent_; + int index_; + int depth_; +}; -class Array; -class DataType; -class Field; -class RecordBatch; +/// \brief Map fields in a schema to dictionary ids +/// +/// The mapping is structural, i.e. the field path (as a vector of indices) +/// is associated to the dictionary id. Review comment: It would be helpful to state that a dictionary id can be associated with multiple field paths ## File path: cpp/src/arrow/ipc/metadata_internal.h ## @@ -48,6 +48,7 @@ namespace flatbuf = org::apache::arrow::flatbuf; namespace ipc { +class DictionaryFieldMapper; class DictionaryMemo; Review comment: Perhaps we should move these both to `ipc::internal`? ## File path: cpp/src/arrow/ipc/dictionary.h ## @@ -21,96 +21,145 @@ #include #include -#include #include #include -#include "arrow/memory_pool.h" +#include "arrow/result.h" #include "arrow/status.h" +#include "arrow/type_fwd.h" #include "arrow/util/macros.h" #include "arrow/util/visibility.h" namespace arrow { +namespace ipc { + +class FieldPosition { + public: + FieldPosition() : parent_(NULLPTR), index_(-1), depth_(0) {} + + FieldPosition child(int index) const { return {this, index}; } + + std::vector path() const { +std::vector path(depth_); +const FieldPosition* cur = this; +for (int i = depth_ - 1; i >= 0; --i) { + path[i] = cur->index_; + cur = cur->parent_; +} +return path; + } + + protected: + FieldPosition(const FieldPosition* parent, int index) + : parent_(parent), index_(index), depth_(parent->depth_ + 1) {} + + const FieldPosition* parent_; + int index_; + int depth_; +}; -class Array; -class DataType; -class Field; -class RecordBatch; +/// \brief Map fields in a schema to dictionary ids +/// +/// The mapping is structural, i.e. the field path (as a vector of indices) +/// is associated to the dictionary id. +class ARROW_EXPORT DictionaryFieldMapper { + public: + DictionaryFieldMapper(); + explicit DictionaryFieldMapper(const Schema& schema); + ~DictionaryFieldMapper(); -namespace ipc { + Status AddSchemaFields(const Schema& schema); + Status AddField(int64_t id, std::vector field_path); -/// \brief Memoization data structure for assigning id numbers to -/// dictionaries and tracking their current state through possible -/// deltas in an IPC stream + Result GetFieldId(std::vector field_path) const; + + int num_fields() const; + + private: + struct Impl; + std::unique_ptr impl_; +}; + +using DictionaryVector = std::vector>>; + +/// \brief Memoization data structure for reading dictionaries from IPC streams +/// +/// This structure tracks the following associations: +/// - field position (structural) -> dictionary id +/// - dictionary id -> value type +/// - dictionary id -> dictionary (value) data +/// +/// Together, they allow resolving dictionary data when reading an IPC stream, +/// using metadata recorded in the schema message and data recorded in the +/// dictionary batch messages (see ResolveDictionaries). +/// +/// This structure isn't useful for writing an IPC stream, where only +/// DictionaryFieldMapper is necessary. class ARROW_EXPORT DictionaryMemo { public: - using DictionaryVector = std::vector>>; - DictionaryMemo(); - DictionaryMemo(DictionaryMemo&&) = default; - DictionaryMemo& operator=(DictionaryMemo&&) = default; + ~DictionaryMemo(); + + DictionaryFieldMapper& fields(); Review comment: Is this necessary? It might be clearer to have `DictionaryFieldMapper*
[GitHub] [arrow] pitrou commented on pull request #7992: ARROW-9660: [C++] Revamp dictionary association in IPC
pitrou commented on pull request #7992: URL: https://github.com/apache/arrow/pull/7992#issuecomment-680893599 Rebased, fixed conflict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] andygrove closed pull request #8043: IGNORE: [Rust] Add Cargo.lock
andygrove closed pull request #8043: URL: https://github.com/apache/arrow/pull/8043 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8055: ARROW-7226: [Python][Doc] Add note re: JSON format support
pitrou commented on pull request #8055: URL: https://github.com/apache/arrow/pull/8055#issuecomment-680891968 Is this useful? The paragraph above contains this sentence: > a JSON file consists of multiple JSON objects, one per line, representing individual data rows Why would you expect something else? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8050: ARROW-9852: [C++] Validate dictionaries fully on IPC read
pitrou commented on pull request #8050: URL: https://github.com/apache/arrow/pull/8050#issuecomment-680887980 I think I'll revisit this after #7992. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
pitrou commented on a change in pull request #8049: URL: https://github.com/apache/arrow/pull/8049#discussion_r477302093 ## File path: cpp/src/arrow/util/decimal_test.cc ## @@ -384,10 +389,10 @@ static const ToStringTestData kToStringTestData[] = { {-1234567890123456789LL, 25, "-1.234567890123456789E-7"}, }; -class Decimal128ToStringTest : public ::testing::TestWithParam {}; +class Decimal128ToStringTest : public ::testing::TestWithParam {}; TEST_P(Decimal128ToStringTest, ToString) { - const ToStringTestData& data = GetParam(); + const ToStringTestParam& data = GetParam(); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8049: ARROW-9851: [C++] Disable AVX512 runtime paths with Valgrind
pitrou commented on pull request #8049: URL: https://github.com/apache/arrow/pull/8049#issuecomment-680880304 > We need to use lower case for ARROW_USER_SIMD_LEVEL Hmm, that's a pity. Can't we use the same convention everywhere? (I would probably favour uppercase, since those names are generally abbreviations) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on pull request #8052: URL: https://github.com/apache/arrow/pull/8052#issuecomment-680876865 However, as said above, perhaps returning `-1` (rather than `0` for success or a positive `errno`-like value for errors) could mean an end of stream. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on a change in pull request #8052: URL: https://github.com/apache/arrow/pull/8052#discussion_r477295669 ## File path: cpp/src/arrow/c/abi.h ## @@ -60,6 +60,31 @@ struct ArrowArray { void* private_data; }; +// EXPERIMENTAL +struct ArrowArrayStream { + // Callback to get the stream type + // (will be the same for all arrays in the stream). + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out); + // Callback to get the next array + // (if no error and the array is released, the stream has ended) Review comment: A nullptr cannot be returned. The callback returns an `int`. However, we could say that returning `-1` means end of stream. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on a change in pull request #8052: URL: https://github.com/apache/arrow/pull/8052#discussion_r477293832 ## File path: cpp/src/arrow/c/abi.h ## @@ -60,6 +60,31 @@ struct ArrowArray { void* private_data; }; +// EXPERIMENTAL +struct ArrowArrayStream { + // Callback to get the stream type + // (will be the same for all arrays in the stream). + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out); + // Callback to get the next array + // (if no error and the array is released, the stream has ended) + // Return value: 0 if successful, an `errno`-compatible error code otherwise. Review comment: I don't how to phrase it: it returns value that are `errno` error codes (in case of error). A number of values are standard in C++: https://en.cppreference.com/w/cpp/error/errno_macros This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on a change in pull request #8052: URL: https://github.com/apache/arrow/pull/8052#discussion_r477294326 ## File path: cpp/src/arrow/c/abi.h ## @@ -60,6 +60,31 @@ struct ArrowArray { void* private_data; }; +// EXPERIMENTAL +struct ArrowArrayStream { + // Callback to get the stream type + // (will be the same for all arrays in the stream). + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + int (*get_schema)(struct ArrowArrayStream*, struct ArrowSchema* out); + // Callback to get the next array + // (if no error and the array is released, the stream has ended) + // Return value: 0 if successful, an `errno`-compatible error code otherwise. + int (*get_next)(struct ArrowArrayStream*, struct ArrowArray* out); + + // Callback to get optional detailed error information. + // This must only be called if the last stream operation failed + // with a non-0 return code. The returned pointer is only valid until + // the next operation on this stream (including release). + // If unavailable, NULL is returned. + const char* (*get_last_error)(struct ArrowArrayStream*); + + // Release callback: release the stream's own resources. + // Note that arrays returned by `get_next` must be individually released. Review comment: Hmm... I'd say no. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] vivkong commented on a change in pull request #8047: ARROW-9844: [CI] Add Go build job on s390x
vivkong commented on a change in pull request #8047: URL: https://github.com/apache/arrow/pull/8047#discussion_r477266332 ## File path: .travis.yml ## @@ -56,6 +56,14 @@ jobs: UBUNTU: "20.04" cares_SOURCE: "BUNDLED" gRPC_SOURCE: "BUNDLED" +- name: "Go on s390x" + os: linux + arch: s390x + env: +ARCH: s390x +ARROW_CI_MODULES: "GO" +DOCKER_IMAGE_ID: debian-go +GO: "1.12" Review comment: Thanks for your feedback. I've removed the line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8052: ARROW-9761: [C/C++] Add experimental C ArrowArrayStream ABI
pitrou commented on pull request #8052: URL: https://github.com/apache/arrow/pull/8052#issuecomment-680762232 It's possible to have some chunks with 0 length in a stream, yes. I don't see any reason to forbid it in this API (and such corner cases are often annoying to deal with). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8036: ARROW-9811: [C++] Unchecked floating point division by 0 should succeed
pitrou commented on a change in pull request #8036: URL: https://github.com/apache/arrow/pull/8036#discussion_r477141994 ## File path: cpp/src/arrow/compare.cc ## @@ -875,9 +879,26 @@ class ScalarEqualsVisitor { return Status::OK(); } + template + typename std::enable_if::value || + std::is_base_of::value, + Status>::type + Visit(const T& left_) { +const auto& right = checked_cast(right_); +if (options_.nans_equal()) { + result_ = right.value == left_.value || +(std::isnan(right.value) && std::isnan(left_.value)); Review comment: `-NAN` isn't really a thing. As per IEEE, every NaN is different and unequal (even to itself). Here we treat them as all equal, because that's vastly more convenient for testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] thamht4190 commented on a change in pull request #8023: ARROW-9318: [C++] Parquet encryption key management
thamht4190 commented on a change in pull request #8023: URL: https://github.com/apache/arrow/pull/8023#discussion_r477067068 ## File path: cpp/src/parquet/encryption_internal.h ## @@ -45,6 +45,9 @@ constexpr int8_t kOffsetIndex = 7; /// Performs AES encryption operations with GCM or CTR ciphers. class AesEncryptor { public: + /// Can serve one key length only. Possible values: 16, 24, 32 bytes. + explicit AesEncryptor(ParquetCipher::type alg_id, int key_len, bool metadata); Review comment: @emkornfield I want to create a local variable of AesEncryptor, without using smart pointers, so it will store in stack rather than in heap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform
github-actions[bot] commented on pull request #8057: URL: https://github.com/apache/arrow/pull/8057#issuecomment-680687276 https://issues.apache.org/jira/browse/ARROW-9862 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk opened a new pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform
kiszk opened a new pull request #8057: URL: https://github.com/apache/arrow/pull/8057 This PR enables `UnsafeDirectLittleEndian` class by removing to throw an exception on a big-endian platform. This is because this class originally supports primitive data types (up to 64-bit) in a native-endianness. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8056: ARROW-9861: [Java] Support big-endian in DecimalVector
github-actions[bot] commented on pull request #8056: URL: https://github.com/apache/arrow/pull/8056#issuecomment-680684161 https://issues.apache.org/jira/browse/ARROW-9861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk opened a new pull request #8056: ARROW-9861: [java] Support big-endian in DecimalVector
kiszk opened a new pull request #8056: URL: https://github.com/apache/arrow/pull/8056 This PR fixes failures in DecimalVectorTest on a big-endian platform This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org