[GitHub] [arrow] Bei-z commented on a change in pull request #8542: ARROW-10407: [C++] Add BasicDecimal256 Division Support

2020-11-11 Thread GitBox


Bei-z commented on a change in pull request #8542:
URL: https://github.com/apache/arrow/pull/8542#discussion_r521719927



##
File path: cpp/src/arrow/util/decimal_test.cc
##
@@ -1296,6 +1296,56 @@ TEST(Decimal256Test, Multiply) {
   }
 }
 
+TEST(Decimal256Test, Divide) {
+  ASSERT_EQ(Decimal256(33), Decimal256(100) / Decimal256(3));
+  ASSERT_EQ(Decimal256(66), Decimal256(200) / Decimal256(3));
+  ASSERT_EQ(Decimal256(66), Decimal256(20100) / Decimal256(301));
+  ASSERT_EQ(Decimal256(-66), Decimal256(-20100) / Decimal256(301));
+  ASSERT_EQ(Decimal256(-66), Decimal256(20100) / Decimal256(-301));
+  ASSERT_EQ(Decimal256(66), Decimal256(-20100) / Decimal256(-301));
+  ASSERT_EQ(Decimal256("-5192296858534827628530496329343552"),
+
Decimal256("-269599466671506397946670150910580797473777870509761363"
+   "24636208709184") /
+Decimal256("5192296858534827628530496329874417"));
+  ASSERT_EQ(Decimal256("5192296858534827628530496329343552"),
+
Decimal256("-269599466671506397946670150910580797473777870509761363"
+   "24636208709184") /
+Decimal256("-5192296858534827628530496329874417"));
+  ASSERT_EQ(Decimal256("5192296858534827628530496329343552"),
+
Decimal256("2695994666715063979466701509105807974737778705097613632"
+   "4636208709184") /
+Decimal256("5192296858534827628530496329874417"));
+  ASSERT_EQ(Decimal256("-5192296858534827628530496329343552"),
+
Decimal256("2695994666715063979466701509105807974737778705097613632"
+   "4636208709184") /
+Decimal256("-5192296858534827628530496329874417"));
+
+  // Test some random numbers.
+  for (auto x : GetRandomNumbers(16)) {
+for (auto y : GetRandomNumbers(16)) {
+  if (y == 0) {
+continue;
+  }
+
+  Decimal256 result = Decimal256(x) / Decimal256(y);
+  ASSERT_EQ(Decimal256(static_cast(x) / y), result)
+  << " x: " << x << " y: " << y;
+}
+  }
+
+  // Test some edge cases
+  for (auto x : std::vector{-INT64_MAX, -INT32_MAX, 0, INT32_MAX, 
INT64_MAX}) {
+for (auto y : std::vector{-INT32_MAX, -32, -2, -1, 1, 2, 32, 
INT32_MAX}) {

Review comment:
   Changed accordingly. Thank you!

##
File path: cpp/src/arrow/util/basic_decimal.cc
##
@@ -549,24 +603,27 @@ static DecimalStatus SingleDivide(const uint32_t* 
dividend, int64_t dividend_len
   return DecimalStatus::kSuccess;
 }
 
-DecimalStatus BasicDecimal128::Divide(const BasicDecimal128& divisor,
-  BasicDecimal128* result,
-  BasicDecimal128* remainder) const {
+/// \brief Do a division where the divisor fits into a single 32 bit value.
+template 
+static inline DecimalStatus DecimalDivide(const DecimalClass& dividend,
+  const DecimalClass& divisor,
+  DecimalClass* result, DecimalClass* 
remainder) {
+  constexpr int64_t kDecimalArrayLength = sizeof(DecimalClass) / 
sizeof(uint32_t);

Review comment:
   added DecimalClass::bit_width. Thx!

##
File path: cpp/src/arrow/util/basic_decimal.cc
##
@@ -549,24 +603,27 @@ static DecimalStatus SingleDivide(const uint32_t* 
dividend, int64_t dividend_len
   return DecimalStatus::kSuccess;
 }
 
-DecimalStatus BasicDecimal128::Divide(const BasicDecimal128& divisor,
-  BasicDecimal128* result,
-  BasicDecimal128* remainder) const {
+/// \brief Do a division where the divisor fits into a single 32 bit value.

Review comment:
   Thank you for catching this! This is a typo caused by copy & paste.

##
File path: cpp/src/arrow/util/decimal_test.cc
##
@@ -40,6 +40,9 @@ namespace arrow {
 using internal::int128_t;
 using internal::uint128_t;
 
+constexpr int128_t kInt128Max = (static_cast(INT64_MAX) << 64) - 1 +
+2 * (static_cast(INT64_MAX) + 1);

Review comment:
   Changed accordingly. Thx!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] ivotron opened a new pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

2020-11-11 Thread GitBox


ivotron opened a new pull request #8647:
URL: https://github.com/apache/arrow/pull/8647


   This PR contains a basic implementation of the Dataset API on Ceph that uses 
the librados C++ API to defer evaluation of expressions to a RADOS storage 
backend. The storage-side code is included, as well as unit and integration 
tests.
   
   The Dataset implementation on RADOS is done by adding new RadosDataset and 
RadosFragment classes. A scanning operation triggers the evaluation of 
expressions on the storage-side. The PR includes a wrapper for the librados 
library, as well as a mock, that allows to run unit tests without having a Ceph 
instance. Integration tests have been modified in order to install Ceph and run 
without the mocks (running tests against a single-node Ceph "cluster").
   
   The storage-side code is implemented as a RADOS CLS (object storage class) 
using Ceph's [RADOS 
SDK](https://docs.ceph.com/en/octopus/architecture/#extending-ceph). The code 
lives in `cpp/src/arrow/adapters/arrow-rados-cls/`, and is expected to be 
deployed on the storage nodes (Ceph's OSDs) prior to operating on tables 
through the RadosDataset implementation. This PR includes a cmake configuration 
for building this library if desired (`ARROW_CLS` cmake option).
   
   Follow up work includes: dataset discovery, python bindings, large fragment 
stripping, IPC improvements on the backend, and a python library for writing 
tables.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson edited a comment on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function

2020-11-11 Thread GitBox


nealrichardson edited a comment on pull request #8256:
URL: https://github.com/apache/arrow/pull/8256#issuecomment-725644439


   @kszucs For some reason the `test-r-linux-as-cran` job (branch) isn't being 
created on crossbow. Unfortunately the crossbow output (e.g. 
https://github.com/apache/arrow/runs/1386912319?check_suite_focus=true) doesn't 
provide any information about it.
   
   FWIW it is still being triggered on the nightly builds: 
https://github.com/ursa-labs/crossbow/branches/all?query=test-r-linux-as-cran



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson closed pull request #8579: ARROW-10481: [R] Bindings to add, remove, replace Table columns

2020-11-11 Thread GitBox


nealrichardson closed pull request #8579:
URL: https://github.com/apache/arrow/pull/8579


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8644: [WIP] Align written buffers to specified value

2020-11-11 Thread GitBox


github-actions[bot] commented on pull request #8644:
URL: https://github.com/apache/arrow/pull/8644#issuecomment-725703424


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] carols10cents commented on pull request #8641: ARROW-8853: [Rust] [Integration Testing] Enable Flight tests

2020-11-11 Thread GitBox


carols10cents commented on pull request #8641:
URL: https://github.com/apache/arrow/pull/8641#issuecomment-725647689


   The archery unit tests [are failing with this 
error](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664293#step:7:19):
   
   ```
   E   ImportError: cannot import name 'HTTPHeaderDict'
   ```
   
   [In the step before the unit tests are run, there's 
this](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664293#step:6:106):
   
   ```
   ERROR: After October 2020 you may experience errors when installing or 
updating packages. This is because pip will change the way that it resolves 
dependency conflicts.
   Successfully installed MarkupSafe-1.1.1 archery attrs-20.3.0 
certifi-2020.11.8 cffi-1.14.3 chardet-3.0.4 click-7.1.2 cryptography-3.2.1 
defusedxml-0.6.0 deprecated-1.2.10 gitdb-4.0.5 gitpython-3.1.11 idna-2.10 
importlib-metadata-2.0.0 iniconfig-1.1.1 jinja2-2.11.2 jira-2.0.0 numpy-1.18.5 
oauthlib-3.1.0 packaging-20.4 pandas-0.25.3 pathlib2-2.3.5 pbr-5.5.1 
pluggy-0.13.1 py-1.9.0 pycparser-2.20 pygithub-1.53 pyjwt-1.7.1 pyparsing-2.4.7 
pytest-6.1.2 python-dateutil-2.8.1 python-dotenv-0.15.0 pytz-2020.4 
requests-2.24.0 requests-oauthlib-1.3.0 requests-toolbelt-0.9.1 
responses-0.12.0 ruamel.yaml-0.16.12 ruamel.yaml.clib-0.2.2 semver-2.13.0 
six-1.15.0 smmap-3.0.4 toml-0.10.2 toolz-0.11.1 urllib3-1.26.0 wrapt-1.12.1 
zipp-1.2.0
   
   We recommend you use --use-feature=2020-resolver to test your packages with 
the new resolver before it becomes the default.
   
   requests 2.24.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you'll 
have urllib3 1.26.0 which is incompatible.
   ```
   
   I'm not very familiar with Python but given it's November this seems 
likely...? I'm not sure the correct way to fix this, and I don't see any open 
PRs about it from a quick skim?
   
   The [other failing test job says it's out of disk 
space](https://github.com/apache/arrow/pull/8641/checks?check_run_id=1386664394#step:8:1499),
 is there anything I can do about that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson opened a new pull request #8643: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()

2020-11-11 Thread GitBox


nealrichardson opened a new pull request #8643:
URL: https://github.com/apache/arrow/pull/8643


   Replaces #8616, which I accidentally git broke. cc @ianmcook



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-11-11 Thread GitBox


wesm commented on a change in pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#discussion_r521655583



##
File path: cpp/src/arrow/type.h
##
@@ -1604,13 +1605,26 @@ class ARROW_EXPORT FieldRef {
 // --
 // Schema
 
+enum class Endianness {
+  LITTLE = 0,
+  BIG = 1,
+#if ARROW_LITTLE_ENDIAN
+  NATIVE = LITTLE
+#else
+  NATIVE = BIG
+#endif
+};

Review comment:
   I will defer to @pitrou or @emkornfield on what to do -- I don't have a 
lot of bandwidth this month to help with writing PRs 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] vertexclique commented on pull request #8635: ARROW-10551: [Rust] Fix unreproducible benches

2020-11-11 Thread GitBox


vertexclique commented on pull request #8635:
URL: https://github.com/apache/arrow/pull/8635#issuecomment-725714558


   @alamb I am unsure about the seedable_rng behavior right now. But let's go 
with this one. You can check the results here:
   https://gist.github.com/vertexclique/44158100c6c019d3d6f02ef94606d91a
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8645: ARROW-10561: [Rust] Small simplification of `write` and `write_bytes`

2020-11-11 Thread GitBox


github-actions[bot] commented on pull request #8645:
URL: https://github.com/apache/arrow/pull/8645#issuecomment-725727007


   https://issues.apache.org/jira/browse/ARROW-10561



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] nealrichardson commented on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function

2020-11-11 Thread GitBox


nealrichardson commented on pull request #8256:
URL: https://github.com/apache/arrow/pull/8256#issuecomment-725644439


   @kszucs For some reason the `test-r-linux-as-cran` job (branch) isn't being 
created on crossbow. Unfortunately the crossbow output (e.g. 
https://github.com/apache/arrow/runs/1386912319?check_suite_focus=true) doesn't 
provide any information about it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8643: ARROW-10522: [R] Allow rename Table and RecordBatch columns with names()

2020-11-11 Thread GitBox


github-actions[bot] commented on pull request #8643:
URL: https://github.com/apache/arrow/pull/8643#issuecomment-725660537


   https://issues.apache.org/jira/browse/ARROW-10522



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #8256: ARROW-9001: [R] Box outputs as correct type in call_function

2020-11-11 Thread GitBox


kou commented on pull request #8256:
URL: https://github.com/apache/arrow/pull/8256#issuecomment-725660754


   FYI: https://github.com/apache/arrow/pull/8386#issuecomment-725085209



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] njwhite opened a new pull request #8644: [WIP] Align written buffers to specified value

2020-11-11 Thread GitBox


njwhite opened a new pull request #8644:
URL: https://github.com/apache/arrow/pull/8644


   See mailing list discussion for now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel

2020-11-11 Thread GitBox


cyb70289 commented on a change in pull request #8637:
URL: https://github.com/apache/arrow/pull/8637#discussion_r521776788



##
File path: docs/source/cpp/compute.rst
##
@@ -140,7 +140,7 @@ Aggregations
 
+--+++---++
 | min_max  | Unary  | Numeric| Scalar Struct  
(1)| :struct:`MinMaxOptions`|
 
+--+++---++
-| mode | Unary  | Numeric| Scalar Struct  
(2)||
+| mode | Unary  | Numeric| Scalar Struct  
(2)| :struct:`ModeOptions`  |

Review comment:
   Mode now returns an array of struct. Should it be "Struct Array"? or 
simply "Struct'.
   Should I also change above line (min_max) to "Struct"? or leave it "Scalar 
Struct".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorgecarleitao closed pull request #8639: ARROW-10559: [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules

2020-11-11 Thread GitBox


jorgecarleitao closed pull request #8639:
URL: https://github.com/apache/arrow/pull/8639


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

2020-11-11 Thread GitBox


github-actions[bot] commented on pull request #8647:
URL: https://github.com/apache/arrow/pull/8647#issuecomment-725759470


   https://issues.apache.org/jira/browse/ARROW-10549



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #8648: ARROW-7906: [C++] Add ORC write support

2020-11-11 Thread GitBox


github-actions[bot] commented on pull request #8648:
URL: https://github.com/apache/arrow/pull/8648#issuecomment-725811236


   https://issues.apache.org/jira/browse/ARROW-7906



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] mathyingzhou opened a new pull request #8648: ARROW-7906: [C++] Add ORC write support

2020-11-11 Thread GitBox


mathyingzhou opened a new pull request #8648:
URL: https://github.com/apache/arrow/pull/8648


   This pull request tracks the progress on adding ORC write support. The 
functionality is not complete yet. However for most types the process of 
populating a ColumnVectorBatch in ORC using data from Arrow Array.
   
   Arrow data types (arrow::Type::type) I do support:
   Boolean: BOOL
   Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE
   Time-related: DATE32
   Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY
   Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, 
SPARSE_UNION
   
   Arrow data types I plan to support:
   Numerical: DECIMAL128
   Time-related: DATE64, TIMESTAMP
   Dictionary: DICTIONARY
   
   Arrow data types I currently do NOT plan to support:
   Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are 
no corresponding types in ORC. Of course except for in the case of DECIMAL256 
we can always cast them into larger types. However I think maybe users need to 
explicitly do that.)
   Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION 
(There are no corresponding types in ORC and it is impossible to cast them into 
ORC types without losing time-related information)
   Extension: EXTENSION 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] arw2019 commented on pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

2020-11-11 Thread GitBox


arw2019 commented on pull request #8294:
URL: https://github.com/apache/arrow/pull/8294#issuecomment-725852668


   Turning to draft while I work on the rebase



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-11-11 Thread GitBox


kiszk commented on a change in pull request #7507:
URL: https://github.com/apache/arrow/pull/7507#discussion_r521763175



##
File path: cpp/src/arrow/type.h
##
@@ -1604,13 +1605,26 @@ class ARROW_EXPORT FieldRef {
 // --
 // Schema
 
+enum class Endianness {
+  LITTLE = 0,
+  BIG = 1,
+#if ARROW_LITTLE_ENDIAN
+  NATIVE = LITTLE
+#else
+  NATIVE = BIG
+#endif
+};

Review comment:
   I see.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel

2020-11-11 Thread GitBox


cyb70289 commented on a change in pull request #8637:
URL: https://github.com/apache/arrow/pull/8637#discussion_r521865256



##
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##
@@ -205,35 +243,59 @@ struct ModeImpl : public ScalarAggregator {
 this->state.MergeFrom(std::move(other.state));
   }
 
-  void Finalize(KernelContext*, Datum* out) override {
-using ModeType = typename TypeTraits::ScalarType;
-using CountType = typename TypeTraits::ScalarType;
+  static std::shared_ptr MakeArrayData(
+  const std::shared_ptr& data_type, int64_t n) {
+auto data = ArrayData::Make(data_type, n, 0);
+data->buffers.resize(2);
+data->buffers[0] = nullptr;
+return data;
+  }
 
-std::vector> values;
-auto mode_count = this->state.Finalize();
-auto mode = mode_count.first;
-auto count = mode_count.second;
-if (count == 0) {
-  values = {std::make_shared(), std::make_shared()};
-} else {
-  values = {std::make_shared(mode), 
std::make_shared(count)};
+  void Finalize(KernelContext* ctx, Datum* out) override {
+const auto& mode_type = TypeTraits::type_singleton();
+const auto& count_type = int64();
+const auto& out_type =
+struct_({field(kModeFieldName, mode_type), field(kCountFieldName, 
count_type)});
+
+int n = this->options.n;
+if (n > state.DistinctValues()) {
+  n = state.DistinctValues();
+}
+if (n <= 0) {
+  *out = Datum(StructArray(out_type, 0, ArrayVector{}).data());

Review comment:
   Done

##
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##
@@ -205,35 +243,59 @@ struct ModeImpl : public ScalarAggregator {
 this->state.MergeFrom(std::move(other.state));
   }
 
-  void Finalize(KernelContext*, Datum* out) override {
-using ModeType = typename TypeTraits::ScalarType;
-using CountType = typename TypeTraits::ScalarType;
+  static std::shared_ptr MakeArrayData(
+  const std::shared_ptr& data_type, int64_t n) {
+auto data = ArrayData::Make(data_type, n, 0);
+data->buffers.resize(2);
+data->buffers[0] = nullptr;
+return data;
+  }
 
-std::vector> values;
-auto mode_count = this->state.Finalize();
-auto mode = mode_count.first;
-auto count = mode_count.second;
-if (count == 0) {
-  values = {std::make_shared(), std::make_shared()};
-} else {
-  values = {std::make_shared(mode), 
std::make_shared(count)};
+  void Finalize(KernelContext* ctx, Datum* out) override {
+const auto& mode_type = TypeTraits::type_singleton();
+const auto& count_type = int64();
+const auto& out_type =
+struct_({field(kModeFieldName, mode_type), field(kCountFieldName, 
count_type)});
+
+int n = this->options.n;
+if (n > state.DistinctValues()) {
+  n = state.DistinctValues();
+}
+if (n <= 0) {
+  *out = Datum(StructArray(out_type, 0, ArrayVector{}).data());
+  return;
 }
-out->value = std::make_shared(std::move(values), 
this->out_type);
+
+auto mode_data = this->MakeArrayData(mode_type, n);
+auto count_data = this->MakeArrayData(count_type, n);
+KERNEL_ASSIGN_OR_RAISE(mode_data->buffers[1], ctx, ctx->Allocate(n * 
sizeof(CType)));
+KERNEL_ASSIGN_OR_RAISE(count_data->buffers[1], ctx,
+   ctx->Allocate(n * sizeof(int64_t)));
+
+CType* mode_buffer = mode_data->template GetMutableValues(1);
+int64_t* count_buffer = count_data->template GetMutableValues(1);
+this->state.Finalize(mode_buffer, count_buffer, n);
+
+*out = Datum(
+StructArray(out_type, n, ArrayVector{MakeArray(mode_data), 
MakeArray(count_data)})

Review comment:
   Done

##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -735,71 +735,63 @@ class TestPrimitiveModeKernel : public ::testing::Test {
  public:
   using ArrowType = T;
   using Traits = TypeTraits;
-  using c_type = typename ArrowType::c_type;
-  using ModeType = typename Traits::ScalarType;
-  using CountType = typename TypeTraits::ScalarType;
-
-  void AssertModeIs(const Datum& array, c_type expected_mode, int64_t 
expected_count) {
-ASSERT_OK_AND_ASSIGN(Datum out, Mode(array));
-const StructScalar& value = out.scalar_as();
+  using CType = typename ArrowType::c_type;
+
+  void AssertModesAre(const Datum& array, const int n,
+  const std::vector& expected_modes,
+  const std::vector& expected_counts) {
+ASSERT_OK_AND_ASSIGN(Datum out, Mode(array, ModeOptions{n}));

Review comment:
   Done

##
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##
@@ -735,71 +735,63 @@ class TestPrimitiveModeKernel : public ::testing::Test {
  public:
   using ArrowType = T;
   using Traits = TypeTraits;
-  using c_type = typename ArrowType::c_type;
-  using ModeType = typename Traits::ScalarType;
-  using CountType = typename TypeTraits::ScalarType;
-
-  void AssertModeIs(const 

[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel

2020-11-11 Thread GitBox


cyb70289 commented on a change in pull request #8637:
URL: https://github.com/apache/arrow/pull/8637#discussion_r521865103



##
File path: cpp/src/arrow/compute/api_aggregate.h
##
@@ -76,6 +76,18 @@ struct ARROW_EXPORT MinMaxOptions : public FunctionOptions {
   enum Mode null_handling;
 };
 
+/// \brief Control Mode kernel behavior
+///
+/// Returns top-n common values and counts.
+/// By default, returns the most common value and count.
+struct ARROW_EXPORT ModeOptions : public FunctionOptions {
+  explicit ModeOptions(int n = 1) : n(n) {}

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on a change in pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel

2020-11-11 Thread GitBox


cyb70289 commented on a change in pull request #8637:
URL: https://github.com/apache/arrow/pull/8637#discussion_r521865178



##
File path: cpp/src/arrow/compute/kernels/aggregate_mode.cc
##
@@ -28,6 +29,9 @@ namespace internal {
 
 namespace {
 
+constexpr char kModeFieldName[] = "modes";
+constexpr char kCountFieldName[] = "counts";

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] cyb70289 commented on pull request #8637: ARROW-10021: [C++][Compute] Return top-n modes in mode kernel

2020-11-11 Thread GitBox


cyb70289 commented on pull request #8637:
URL: https://github.com/apache/arrow/pull/8637#issuecomment-725891740


   Python unit test failed. I'm updating mode kernel python binding per cpp 
changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




<    1   2