[GitHub] [arrow] Bei-z commented on a change in pull request #8542: ARROW-10407: [C++] Add BasicDecimal256 Division Support
Bei-z commented on a change in pull request #8542: URL: https://github.com/apache/arrow/pull/8542#discussion_r521719927 ## File path: cpp/src/arrow/util/decimal_test.cc ## @@ -1296,6 +1296,56 @@ TEST(Decimal256Test, Multiply) { } } +TEST(Decimal256Test, Divide) { + ASSERT_EQ(Decimal256(33), Decimal256(100) / Decimal256(3)); + ASSERT_EQ(Decimal256(66), Decimal256(200) / Decimal256(3)); + ASSERT_EQ(Decimal256(66), Decimal256(20100) / Decimal256(301)); + ASSERT_EQ(Decimal256(-66), Decimal256(-20100) / Decimal256(301)); + ASSERT_EQ(Decimal256(-66), Decimal256(20100) / Decimal256(-301)); + ASSERT_EQ(Decimal256(66), Decimal256(-20100) / Decimal256(-301)); + ASSERT_EQ(Decimal256("-5192296858534827628530496329343552"), + Decimal256("-269599466671506397946670150910580797473777870509761363" + "24636208709184") / +Decimal256("5192296858534827628530496329874417")); + ASSERT_EQ(Decimal256("5192296858534827628530496329343552"), + Decimal256("-269599466671506397946670150910580797473777870509761363" + "24636208709184") / +Decimal256("-5192296858534827628530496329874417")); + ASSERT_EQ(Decimal256("5192296858534827628530496329343552"), + Decimal256("2695994666715063979466701509105807974737778705097613632" + "4636208709184") / +Decimal256("5192296858534827628530496329874417")); + ASSERT_EQ(Decimal256("-5192296858534827628530496329343552"), + Decimal256("2695994666715063979466701509105807974737778705097613632" + "4636208709184") / +Decimal256("-5192296858534827628530496329874417")); + + // Test some random numbers. + for (auto x : GetRandomNumbers(16)) { +for (auto y : GetRandomNumbers(16)) { + if (y == 0) { +continue; + } + + Decimal256 result = Decimal256(x) / Decimal256(y); + ASSERT_EQ(Decimal256(static_cast(x) / y), result) + << " x: " << x << " y: " << y; +} + } + + // Test some edge cases + for (auto x : std::vector{-INT64_MAX, -INT32_MAX, 0, INT32_MAX, INT64_MAX}) { +for (auto y : std::vector{-INT32_MAX, -32, -2, -1, 1, 2, 32, INT32_MAX}) { Review comment: Changed accordingly. Thank you! ## File path: cpp/src/arrow/util/basic_decimal.cc ## @@ -549,24 +603,27 @@ static DecimalStatus SingleDivide(const uint32_t* dividend, int64_t dividend_len return DecimalStatus::kSuccess; } -DecimalStatus BasicDecimal128::Divide(const BasicDecimal128& divisor, - BasicDecimal128* result, - BasicDecimal128* remainder) const { +/// \brief Do a division where the divisor fits into a single 32 bit value. +template +static inline DecimalStatus DecimalDivide(const DecimalClass& dividend, + const DecimalClass& divisor, + DecimalClass* result, DecimalClass* remainder) { + constexpr int64_t kDecimalArrayLength = sizeof(DecimalClass) / sizeof(uint32_t); Review comment: added DecimalClass::bit_width. Thx! ## File path: cpp/src/arrow/util/basic_decimal.cc ## @@ -549,24 +603,27 @@ static DecimalStatus SingleDivide(const uint32_t* dividend, int64_t dividend_len return DecimalStatus::kSuccess; } -DecimalStatus BasicDecimal128::Divide(const BasicDecimal128& divisor, - BasicDecimal128* result, - BasicDecimal128* remainder) const { +/// \brief Do a division where the divisor fits into a single 32 bit value. Review comment: Thank you for catching this! This is a typo caused by copy & paste. ## File path: cpp/src/arrow/util/decimal_test.cc ## @@ -40,6 +40,9 @@ namespace arrow { using internal::int128_t; using internal::uint128_t; +constexpr int128_t kInt128Max = (static_cast(INT64_MAX) << 64) - 1 + +2 * (static_cast(INT64_MAX) + 1); Review comment: Changed accordingly. Thx! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] ivotron opened a new pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset
ivotron opened a new pull request #8647: URL: https://github.com/apache/arrow/pull/8647 This PR contains a basic implementation of the Dataset API on Ceph that uses the librados C++ API to defer evaluation of expressions to a RADOS storage backend. The storage-side code is included, as well as unit and integration tests. The Dataset implementation on RADOS is done by adding new RadosDataset and RadosFragment classes. A scanning operation triggers the evaluation of expressions on the storage-side. The PR includes a wrapper for the librados library, as well as a mock, that allows to run unit tests without having a Ceph instance. Integration tests have been modified in order to install Ceph and run without the mocks (running tests against a single-node Ceph "cluster"). The storage-side code is implemented as a RADOS CLS (object storage class) using Ceph's [RADOS SDK](https://docs.ceph.com/en/octopus/architecture/#extending-ceph). The code lives in `cpp/src/arrow/adapters/arrow-rados-cls/`, and is expected to be deployed on the storage nodes (Ceph's OSDs) prior to operating on tables through the RadosDataset implementation. This PR includes a cmake configuration for building this library if desired (`ARROW_CLS` cmake option). Follow up work includes: dataset discovery, python bindings, large fragment stripping, IPC improvements on the backend, and a python library for writing tables. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson edited a comment on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function
nealrichardson edited a comment on pull request #8256: URL: https://github.com/apache/arrow/pull/8256#issuecomment-725644439 @kszucs For some reason the `test-r-linux-as-cran` job (branch) isn't being created on crossbow. Unfortunately the crossbow output (e.g. https://github.com/apache/arrow/runs/1386912319?check_suite_focus=true) doesn't provide any information about it. FWIW it is still being triggered on the nightly builds: https://github.com/ursa-labs/crossbow/branches/all?query=test-r-linux-as-cran This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #8579: ARROW-10481: [R] Bindings to add, remove, replace Table columns
nealrichardson closed pull request #8579: URL: https://github.com/apache/arrow/pull/8579 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8644: [WIP] Align written buffers to specified value
github-actions[bot] commented on pull request #8644: URL: https://github.com/apache/arrow/pull/8644#issuecomment-725703424 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could you also rename pull request title in the following format? ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY} See also: * [Other pull requests](https://github.com/apache/arrow/pulls/) * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] carols10cents commented on pull request #8641: ARROW-8853: [Rust] [Integration Testing] Enable Flight tests
carols10cents commented on pull request #8641: URL: https://github.com/apache/arrow/pull/8641#issuecomment-725647689 The archery unit tests [are failing with this error](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664293#step:7:19): ``` E ImportError: cannot import name 'HTTPHeaderDict' ``` [In the step before the unit tests are run, there's this](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664293#step:6:106): ``` ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts. Successfully installed MarkupSafe-1.1.1 archery attrs-20.3.0 certifi-2020.11.8 cffi-1.14.3 chardet-3.0.4 click-7.1.2 cryptography-3.2.1 defusedxml-0.6.0 deprecated-1.2.10 gitdb-4.0.5 gitpython-3.1.11 idna-2.10 importlib-metadata-2.0.0 iniconfig-1.1.1 jinja2-2.11.2 jira-2.0.0 numpy-1.18.5 oauthlib-3.1.0 packaging-20.4 pandas-0.25.3 pathlib2-2.3.5 pbr-5.5.1 pluggy-0.13.1 py-1.9.0 pycparser-2.20 pygithub-1.53 pyjwt-1.7.1 pyparsing-2.4.7 pytest-6.1.2 python-dateutil-2.8.1 python-dotenv-0.15.0 pytz-2020.4 requests-2.24.0 requests-oauthlib-1.3.0 requests-toolbelt-0.9.1 responses-0.12.0 ruamel.yaml-0.16.12 ruamel.yaml.clib-0.2.2 semver-2.13.0 six-1.15.0 smmap-3.0.4 toml-0.10.2 toolz-0.11.1 urllib3-1.26.0 wrapt-1.12.1 zipp-1.2.0 We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default. requests 2.24.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you'll have urllib3 1.26.0 which is incompatible. ``` I'm not very familiar with Python but given it's November this seems likely...? I'm not sure the correct way to fix this, and I don't see any open PRs about it from a quick skim? The [other failing test job says it's out of disk space](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664394#step:8:1499), is there anything I can do about that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson opened a new pull request #8643: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()
nealrichardson opened a new pull request #8643: URL: https://github.com/apache/arrow/pull/8643 Replaces #8616, which I accidentally git broke. cc @ianmcook This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian
wesm commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r521655583 ## File path: cpp/src/arrow/type.h ## @@ -1604,13 +1605,26 @@ class ARROW_EXPORT FieldRef { // -- // Schema +enum class Endianness { + LITTLE = 0, + BIG = 1, +#if ARROW_LITTLE_ENDIAN + NATIVE = LITTLE +#else + NATIVE = BIG +#endif +}; Review comment: I will defer to @pitrou or @emkornfield on what to do -- I don't have a lot of bandwidth this month to help with writing PRs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] vertexclique commented on pull request #8635: ARROW-10551: [Rust] Fix unreproducible benches
vertexclique commented on pull request #8635: URL: https://github.com/apache/arrow/pull/8635#issuecomment-725714558 @alamb I am unsure about the seedable_rng behavior right now. But let's go with this one. You can check the results here: https://gist.github.com/vertexclique/44158100c6c019d3d6f02ef94606d91a This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8645: ARROW-10561: [Rust] Small simplification of `write` and `write_bytes`
github-actions[bot] commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-725727007 https://issues.apache.org/jira/browse/ARROW-10561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function
nealrichardson commented on pull request #8256: URL: https://github.com/apache/arrow/pull/8256#issuecomment-725644439 @kszucs For some reason the `test-r-linux-as-cran` job (branch) isn't being created on crossbow. Unfortunately the crossbow output (e.g. https://github.com/apache/arrow/runs/1386912319?check_suite_focus=true) doesn't provide any information about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8643: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()
github-actions[bot] commented on pull request #8643: URL: https://github.com/apache/arrow/pull/8643#issuecomment-725660537 https://issues.apache.org/jira/browse/ARROW-10522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function
kou commented on pull request #8256: URL: https://github.com/apache/arrow/pull/8256#issuecomment-725660754 FYI: https://github.com/apache/arrow/pull/8386#issuecomment-725085209 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] njwhite opened a new pull request #8644: [WIP] Align written buffers to specified value
njwhite opened a new pull request #8644: URL: https://github.com/apache/arrow/pull/8644 See mailing list discussion for now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel
cyb70289 commented on a change in pull request #8637: URL: https://github.com/apache/arrow/pull/8637#discussion_r521776788 ## File path: docs/source/cpp/compute.rst ## @@ -140,7 +140,7 @@ Aggregations +--+++---++ | min_max | Unary | Numeric| Scalar Struct (1)| :struct:`MinMaxOptions`| +--+++---++ -| mode | Unary | Numeric| Scalar Struct (2)|| +| mode | Unary | Numeric| Scalar Struct (2)| :struct:`ModeOptions` | Review comment: Mode now returns an array of struct. Should it be "Struct Array"? or simply "Struct'. Should I also change above line (min_max) to "Struct"? or leave it "Scalar Struct". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #8639: ARROW-10559: [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules
jorgecarleitao closed pull request #8639: URL: https://github.com/apache/arrow/pull/8639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset
github-actions[bot] commented on pull request #8647: URL: https://github.com/apache/arrow/pull/8647#issuecomment-725759470 https://issues.apache.org/jira/browse/ARROW-10549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8648: ARROW-7906: [C++] Add ORC write support
github-actions[bot] commented on pull request #8648: URL: https://github.com/apache/arrow/pull/8648#issuecomment-725811236 https://issues.apache.org/jira/browse/ARROW-7906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] mathyingzhou opened a new pull request #8648: ARROW-7906: [C++] Add ORC write support
mathyingzhou opened a new pull request #8648: URL: https://github.com/apache/arrow/pull/8648 This pull request tracks the progress on adding ORC write support. The functionality is not complete yet. However for most types the process of populating a ColumnVectorBatch in ORC using data from Arrow Array. Arrow data types (arrow::Type::type) I do support: Boolean: BOOL Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE Time-related: DATE32 Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, SPARSE_UNION Arrow data types I plan to support: Numerical: DECIMAL128 Time-related: DATE64, TIMESTAMP Dictionary: DICTIONARY Arrow data types I currently do NOT plan to support: Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are no corresponding types in ORC. Of course except for in the case of DECIMAL256 we can always cast them into larger types. However I think maybe users need to explicitly do that.) Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION (There are no corresponding types in ORC and it is impossible to cast them into ORC types without losing time-related information) Extension: EXTENSION This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] arw2019 commented on pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data
arw2019 commented on pull request #8294: URL: https://github.com/apache/arrow/pull/8294#issuecomment-725852668 Turning to draft while I work on the rebase This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r521763175 ## File path: cpp/src/arrow/type.h ## @@ -1604,13 +1605,26 @@ class ARROW_EXPORT FieldRef { // -- // Schema +enum class Endianness { + LITTLE = 0, + BIG = 1, +#if ARROW_LITTLE_ENDIAN + NATIVE = LITTLE +#else + NATIVE = BIG +#endif +}; Review comment: I see. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel
cyb70289 commented on a change in pull request #8637: URL: https://github.com/apache/arrow/pull/8637#discussion_r521865256 ## File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc ## @@ -205,35 +243,59 @@ struct ModeImpl : public ScalarAggregator { this->state.MergeFrom(std::move(other.state)); } - void Finalize(KernelContext*, Datum* out) override { -using ModeType = typename TypeTraits::ScalarType; -using CountType = typename TypeTraits::ScalarType; + static std::shared_ptr MakeArrayData( + const std::shared_ptr& data_type, int64_t n) { +auto data = ArrayData::Make(data_type, n, 0); +data->buffers.resize(2); +data->buffers[0] = nullptr; +return data; + } -std::vector> values; -auto mode_count = this->state.Finalize(); -auto mode = mode_count.first; -auto count = mode_count.second; -if (count == 0) { - values = {std::make_shared(), std::make_shared()}; -} else { - values = {std::make_shared(mode), std::make_shared(count)}; + void Finalize(KernelContext* ctx, Datum* out) override { +const auto& mode_type = TypeTraits::type_singleton(); +const auto& count_type = int64(); +const auto& out_type = +struct_({field(kModeFieldName, mode_type), field(kCountFieldName, count_type)}); + +int n = this->options.n; +if (n > state.DistinctValues()) { + n = state.DistinctValues(); +} +if (n <= 0) { + *out = Datum(StructArray(out_type, 0, ArrayVector{}).data()); Review comment: Done ## File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc ## @@ -205,35 +243,59 @@ struct ModeImpl : public ScalarAggregator { this->state.MergeFrom(std::move(other.state)); } - void Finalize(KernelContext*, Datum* out) override { -using ModeType = typename TypeTraits::ScalarType; -using CountType = typename TypeTraits::ScalarType; + static std::shared_ptr MakeArrayData( + const std::shared_ptr& data_type, int64_t n) { +auto data = ArrayData::Make(data_type, n, 0); +data->buffers.resize(2); +data->buffers[0] = nullptr; +return data; + } -std::vector> values; -auto mode_count = this->state.Finalize(); -auto mode = mode_count.first; -auto count = mode_count.second; -if (count == 0) { - values = {std::make_shared(), std::make_shared()}; -} else { - values = {std::make_shared(mode), std::make_shared(count)}; + void Finalize(KernelContext* ctx, Datum* out) override { +const auto& mode_type = TypeTraits::type_singleton(); +const auto& count_type = int64(); +const auto& out_type = +struct_({field(kModeFieldName, mode_type), field(kCountFieldName, count_type)}); + +int n = this->options.n; +if (n > state.DistinctValues()) { + n = state.DistinctValues(); +} +if (n <= 0) { + *out = Datum(StructArray(out_type, 0, ArrayVector{}).data()); + return; } -out->value = std::make_shared(std::move(values), this->out_type); + +auto mode_data = this->MakeArrayData(mode_type, n); +auto count_data = this->MakeArrayData(count_type, n); +KERNEL_ASSIGN_OR_RAISE(mode_data->buffers[1], ctx, ctx->Allocate(n * sizeof(CType))); +KERNEL_ASSIGN_OR_RAISE(count_data->buffers[1], ctx, + ctx->Allocate(n * sizeof(int64_t))); + +CType* mode_buffer = mode_data->template GetMutableValues(1); +int64_t* count_buffer = count_data->template GetMutableValues(1); +this->state.Finalize(mode_buffer, count_buffer, n); + +*out = Datum( +StructArray(out_type, n, ArrayVector{MakeArray(mode_data), MakeArray(count_data)}) Review comment: Done ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -735,71 +735,63 @@ class TestPrimitiveModeKernel : public ::testing::Test { public: using ArrowType = T; using Traits = TypeTraits; - using c_type = typename ArrowType::c_type; - using ModeType = typename Traits::ScalarType; - using CountType = typename TypeTraits::ScalarType; - - void AssertModeIs(const Datum& array, c_type expected_mode, int64_t expected_count) { -ASSERT_OK_AND_ASSIGN(Datum out, Mode(array)); -const StructScalar& value = out.scalar_as(); + using CType = typename ArrowType::c_type; + + void AssertModesAre(const Datum& array, const int n, + const std::vector& expected_modes, + const std::vector& expected_counts) { +ASSERT_OK_AND_ASSIGN(Datum out, Mode(array, ModeOptions{n})); Review comment: Done ## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ## @@ -735,71 +735,63 @@ class TestPrimitiveModeKernel : public ::testing::Test { public: using ArrowType = T; using Traits = TypeTraits; - using c_type = typename ArrowType::c_type; - using ModeType = typename Traits::ScalarType; - using CountType = typename TypeTraits::ScalarType; - - void AssertModeIs(const
[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel
cyb70289 commented on a change in pull request #8637: URL: https://github.com/apache/arrow/pull/8637#discussion_r521865103 ## File path: cpp/src/arrow/compute/api_aggregate.h ## @@ -76,6 +76,18 @@ struct ARROW_EXPORT MinMaxOptions : public FunctionOptions { enum Mode null_handling; }; +/// \brief Control Mode kernel behavior +/// +/// Returns top-n common values and counts. +/// By default, returns the most common value and count. +struct ARROW_EXPORT ModeOptions : public FunctionOptions { + explicit ModeOptions(int n = 1) : n(n) {} Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel
cyb70289 commented on a change in pull request #8637: URL: https://github.com/apache/arrow/pull/8637#discussion_r521865178 ## File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc ## @@ -28,6 +29,9 @@ namespace internal { namespace { +constexpr char kModeFieldName[] = "modes"; +constexpr char kCountFieldName[] = "counts"; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] cyb70289 commented on pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel
cyb70289 commented on pull request #8637: URL: https://github.com/apache/arrow/pull/8637#issuecomment-725891740 Python unit test failed. I'm updating mode kernel python binding per cpp changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org