[GitHub] [arrow] github-actions[bot] commented on pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7520: URL: https://github.com/apache/arrow/pull/7520#issuecomment-647796781 https://issues.apache.org/jira/browse/ARROW-9189 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487 Here's a benchmark run with gcc-8 ``` --- BenchmarkTime CPU Iterations

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487 Here's a benchmark run with gcc-8 ``` --- BenchmarkTime CPU

[GitHub] [arrow] nealrichardson commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-22 Thread GitBox
nealrichardson commented on a change in pull request #7435: URL: https://github.com/apache/arrow/pull/7435#discussion_r443735874 ## File path: r/src/array_from_vector.cpp ## @@ -201,6 +202,67 @@ struct VectorToArrayConverter { return Status::OK(); } + template +

[GitHub] [arrow] pitrou closed pull request #7504: ARROW-9193: [C++] Avoid spurious intermediate string copy in ToDateHolder

2020-06-22 Thread GitBox
pitrou closed pull request #7504: URL: https://github.com/apache/arrow/pull/7504 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-647694302 Here's what I see with gcc-8: ``` benchmark baseline contender change % regression 1 BM_ReadColumn/1/1 1.336

[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-22 Thread GitBox
alippai commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-647789155 Does "for testing and benchmarking" mean that it's not optimal to store arrow/parquet files on Minio for production workloads?

[GitHub] [arrow] github-actions[bot] commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647794119 Revision: 56a5e1ce92e541cfaac827d5334cbe8981b1b74b Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] kou closed pull request #7508: ARROW-9202: [GLib] Add GArrowDatum

2020-06-22 Thread GitBox
kou closed pull request #7508: URL: https://github.com/apache/arrow/pull/7508 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256 FWIW on the "gcc/clang perf discussion", clang also shows performance benefits and limited downside ``` benchmark baselinecontender

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647825256 FWIW on the "gcc/clang perf discussion", clang-8 also shows performance benefits and limited downside ``` benchmark baseline

[GitHub] [arrow] projjal commented on pull request #7504: ARROW-9193: [C++] Avoid spurious intermediate string copy in ToDateHolder

2020-06-22 Thread GitBox
projjal commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647691097 > Any reason not to use a `util::string_view`? Since the toDate function here takes a char array and just passes it onto the parseTimestamp method, using a

[GitHub] [arrow] fsaintjacques commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
fsaintjacques commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647698721 ursabot uses `tabulate` which I think is smaller dependencies. This is an automated message from the Apache

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-647699240 Here's with clang, even less of an issue there it seems ``` benchmark baselinecontender change % regression 32

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647758158 I’m sort of -1 on using anything but pandas for data munging and data presentation in our tooling. It’s not a very large dependency and has everything we need.

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-22 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r443733083 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647696604 +1. Appveyor build https://ci.appveyor.com/project/wesm/arrow/builds/33669506 This is an automated message from the

[GitHub] [arrow] wesm closed pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm closed pull request #7506: URL: https://github.com/apache/arrow/pull/7506 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kou commented on a change in pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-22 Thread GitBox
kou commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r443832986 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -409,6 +414,32 @@ class IpcTestFixture : public io::MemoryMapFixture, public ExtensionTypesMixin {

[GitHub] [arrow] kou commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
kou commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647793370 @github-actions crossbow submit debian-buster-arm64 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kou commented on pull request #7508: ARROW-9202: [GLib] Add GArrowDatum

2020-06-22 Thread GitBox
kou commented on pull request #7508: URL: https://github.com/apache/arrow/pull/7508#issuecomment-647793770 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647823397 Also on the binary size, these changes add about 75KB to libarrow.so. My guess is the difference is mostly coming from code inlining for the all-null case (which wasn't split out

[GitHub] [arrow] nealrichardson opened a new pull request #7518: ARROW-9138: [Docs][Format] Make sure format version is hard coded in the docs

2020-06-22 Thread GitBox
nealrichardson opened a new pull request #7518: URL: https://github.com/apache/arrow/pull/7518 Since after 1.0, the library versions will continue to increase but the format will remain at 1.0 until it is modified, we need to version the specification docs separately from the library

[GitHub] [arrow] github-actions[bot] commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647759816 Revision: 77bc06d2ff96b41dd1e37838601f92fd0e95e7e2 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
wesm commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647763853 @xhochy I'm fine with merging this -- it's a non-mandatory dependency right (sorry have not reviewed the patch yet and the PR description does not say)?

[GitHub] [arrow] kou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
kou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647791287 Ah, we need the `UTF8PROC_STATIC` definition. Does this work? ```diff diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake

[GitHub] [arrow] wesm opened a new pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm opened a new pull request #7521: URL: https://github.com/apache/arrow/pull/7521 This significantly speeds up processing of mostly-not-null or mostly-null data, while having almost no overhead for the other scenarios where you rarely have a word-sized run of all-not-null or

[GitHub] [arrow] xhochy commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
xhochy commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647688900 @maartenbreddels Feel free to rebase again, this patch is ready for a merge. @wesm Should we merge this as-is or is delay it until @maartenbreddels branch is ready?

[GitHub] [arrow] fsaintjacques opened a new pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-22 Thread GitBox
fsaintjacques opened a new pull request #7517: URL: https://github.com/apache/arrow/pull/7517 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] github-actions[bot] commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647820320 https://issues.apache.org/jira/browse/ARROW-9210 This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487 Here's a benchmark run with gcc-8 ``` --- BenchmarkTime

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647711867 @xhochy I see at least an `undefined reference to `_imp__utf8proc_toupper'` in the MinGW builds, hope that helps.

[GitHub] [arrow] kszucs opened a new pull request #7519: ARROW-9153: [C++][Python] Refactor scalar bindings [WIP]

2020-06-22 Thread GitBox
kszucs opened a new pull request #7519: URL: https://github.com/apache/arrow/pull/7519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #7519: ARROW-9153: [C++][Python] Refactor scalar bindings [WIP]

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-647754455 https://issues.apache.org/jira/browse/ARROW-9153 This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
kou commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647758878 @github-actions crossbow submit debian-buster-arm64 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
kszucs commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647798092 Using pandas is not a problem, but other than sorting the results we cannot really improve the look and feel.

[GitHub] [arrow] kszucs edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
kszucs edited a comment on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647798092 Using pandas is not a problem, but the results cannot be improved much other than sorting the table. This

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487 Here's a benchmark run with gcc-8 ``` --- BenchmarkTime CPU

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-647690792 I rebased. I'm going to look at the benchmarks locally and then probably merge this so further performance work (including comparing with BitBlockCounter) can be pursued as follow

[GitHub] [arrow] nealrichardson commented on pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
nealrichardson commented on pull request #7503: URL: https://github.com/apache/arrow/pull/7503#issuecomment-647691008 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] nealrichardson closed pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
nealrichardson closed pull request #7503: URL: https://github.com/apache/arrow/pull/7503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-647699887 https://issues.apache.org/jira/browse/ARROW-1682 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7518: ARROW-9138: [Docs][Format] Make sure format version is hard coded in the docs

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7518: URL: https://github.com/apache/arrow/pull/7518#issuecomment-647699885 https://issues.apache.org/jira/browse/ARROW-9138 This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647758158 I’m sort of -1 on using anything but pandas for data munging and data presentation in our tooling. It’s not a very large dependency and has everything we need. FWIW, the

[GitHub] [arrow] nealrichardson opened a new pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-22 Thread GitBox
nealrichardson opened a new pull request #7520: URL: https://github.com/apache/arrow/pull/7520 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-22 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r443912778 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -409,6 +414,32 @@ class IpcTestFixture : public io::MemoryMapFixture, public ExtensionTypesMixin {

[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
cyb70289 commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768 > I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort kernels.

[GitHub] [arrow] kou commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
kou commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647833733 @github-actions crossbow submit debian-buster-arm64 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443922882 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names:

[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443928439 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names:

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747 Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`. I'm happy

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747 Sorting seems too important to leave it to these relatively complex templates, I would suggest implementing the counting sort without using `VisitArrayDataInline`

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437 FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8 ```

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647888437 FWIW the performance issue seems to be more pronounced on gcc than clang, here is the benchmark comparison on my machine with clang-8 ```

[GitHub] [arrow] emkornfield commented on pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged

2020-06-22 Thread GitBox
emkornfield commented on pull request #6402: URL: https://github.com/apache/arrow/pull/6402#issuecomment-647907608 @jacques-n are you happy with the changes? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] github-actions[bot] commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647836070 Revision: cb16452e499fb61e6ec231fed4c2ddc7fe88d11b Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] wesm commented on a change in pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-22 Thread GitBox
wesm commented on a change in pull request #7522: URL: https://github.com/apache/arrow/pull/7522#discussion_r443913968 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -1269,13 +1283,18 @@ class DatetimeTZWriter : public DatetimeNanoWriter { :

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647860226 @kou utf8proc should only be used in a small number of compilation units, so what do you think about just using `set_target_properties(... PROPERTIES COMPILE_DEFINITIONS

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135 I'm refactoring to nix util::optional. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] kou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
kou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647834815 The change doesn't add `UTF8PROC_STATIC` definition... This is an automated message from the Apache Git Service. To

[GitHub] [arrow] kou commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
kou commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647834442 TODO: * Need the `UTF8PROC_STATIC` definition with bundled utf8proc * Need to add include path for `utf8proc.h`

[GitHub] [arrow] kou closed pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
kou closed pull request #7509: URL: https://github.com/apache/arrow/pull/7509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-647857111 https://issues.apache.org/jira/browse/ARROW-8801 This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884747 Sorting seems too important to leave it to these relatively complex templates (for example, just after determining that a value is not null, the `optional` value is checked

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135 I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning.

[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647887135 I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort

[GitHub] [arrow] wesm edited a comment on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647860226 @kou utf8proc should only be used in a small number of compilation units, so what do you think about just using `set_source_files_properties(... PROPERTIES

[GitHub] [arrow] mrkn commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-647860707 > I don't understand the PR's motivation. You say that: > > > To support the integration with scipy.sparse.coo_matrix, it is necessary to add a flag in SparseCOOIndex > >

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647886669 Also, I don't really understand the use of `util::optional` in these templates. The user should pass separate lambdas for the not-null and null cases

[GitHub] [arrow] kou commented on pull request #7509: ARROW-9203: [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install

2020-06-22 Thread GitBox
kou commented on pull request #7509: URL: https://github.com/apache/arrow/pull/7509#issuecomment-647843737 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] emkornfield commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-22 Thread GitBox
emkornfield commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-647903916 It would be great if @jacques-n could confirm he is OK with these changes. I believe on a previous PR the extra indirection was a concern.

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-22 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r443914793 ## File path: docs/source/developers/contributing.rst ## @@ -17,53 +17,73 @@ .. _contributing: -*** -Contribution Guidelines

[GitHub] [arrow] emkornfield commented on a change in pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-22 Thread GitBox
emkornfield commented on a change in pull request #7321: URL: https://github.com/apache/arrow/pull/7321#discussion_r443958283 ## File path: format/Schema.fbs ## @@ -134,11 +134,20 @@ table FixedSizeBinary { table Bool { } +/// Exact decimal value represented as an integer

[GitHub] [arrow] wesm opened a new pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-22 Thread GitBox
wesm opened a new pull request #7522: URL: https://github.com/apache/arrow/pull/7522 An array object was failing to be decref'd on the DatetimeTZ conversion path. The code is slightly complicated by the different calling reference ownership semantics of the Array/ChunkedArray conversion

[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
cyb70289 commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647878260 @ursabot benchmark --suite-filter=arrow-compute-vector-sort-benchmark This is an automated message from the

[GitHub] [arrow] ursabot commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
ursabot commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647883517 [AMD64 Ubuntu 18.04 C++ Benchmark (#114347)](https://ci.ursalabs.org/#builders/73/builds/89) builder has been succeeded. Revision: dbd166df749e73cbf7c1ec0c6cfa5837280aa32d

[GitHub] [arrow] cyb70289 commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-22 Thread GitBox
cyb70289 commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647884195 I see big performance drop from some counting sort cases, also tested on my local machine. Should be related to these visitor code:

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647883524 I improved the output to show the `state.counters` stuff ``` benchmark baselinecontender change %

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647649670 For curiosity, here are the same benchmarks on my laptop with gcc-8 (using the new output formatting from ARROW-9201) ``` benchmark

[GitHub] [arrow] pitrou commented on a change in pull request #7477: ARROW-4221: Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443707259 ## File path: python/pyarrow/tensor.pxi ## @@ -339,6 +350,15 @@ shape: {0.shape}""".format(self) def non_zero_length(self): return

[GitHub] [arrow] pitrou commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
pitrou commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647673667 Just a small question: why are `m` and `b` used for millions and billions, respectively? (I would probably expect `M` and `G`)

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-647674585 @emkornfield I'm sorry that I've been neglecting this PR. I will try to rebase this and investigate the perf questions a little bit

[GitHub] [arrow] kszucs edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
kszucs edited a comment on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647677894 > @kszucs can you assist me with adapting ursabot for these changes? Sure. > I think we can use pandas's `DataFrame.to_html` to create a colorized table for

[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
kszucs commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647677894 > @kszucs can you assist me with adapting ursabot for these changes? Sure. > I think we can use pandas's `DataFrame.to_html` to create a colorized table for GitHub, too

[GitHub] [arrow] romainfrancois commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-22 Thread GitBox
romainfrancois commented on a change in pull request #7435: URL: https://github.com/apache/arrow/pull/7435#discussion_r443386121 ## File path: r/src/array_from_vector.cpp ## @@ -201,6 +202,67 @@ struct VectorToArrayConverter { return Status::OK(); } + template +

[GitHub] [arrow] pitrou commented on pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647397849 @projjal Where do the null-terminated strings come from? Doesn't Gandiva operate on Arrow data? This is an

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443428904 ## File path: cpp/src/arrow/util/value_parsing.h ## @@ -565,6 +565,39 @@ static inline bool ParseTimestampStrptime(const char* buf, size_t length,

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443432735 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] pitrou closed pull request #7511: ARROW-9205: [Documentation] Fix typos in Columnar.rst

2020-06-22 Thread GitBox
pitrou closed pull request #7511: URL: https://github.com/apache/arrow/pull/7511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647408989 https://issues.apache.org/jira/browse/ARROW-8149 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647408675 Revision: 7e7b9f9d497d6256aaad68436a7d72bed4842c34 Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443441540 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] kszucs commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
kszucs commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443444034 ## File path: dev/tasks/conda-recipes/arrow-cpp/build-pyarrow.sh ## @@ -16,10 +16,26 @@ export PYARROW_WITH_ORC=1 export PYARROW_WITH_PARQUET=1 export

[GitHub] [arrow] pitrou commented on a change in pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7503: URL: https://github.com/apache/arrow/pull/7503#discussion_r443443849 ## File path: docs/source/developers/contributing.rst ## @@ -127,3 +127,52 @@ To contribute a patch: * Add new unit tests for your code. Thank you in

[GitHub] [arrow] kszucs commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
kszucs commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443447582 ## File path: dev/tasks/conda-recipes/arrow-cpp/meta.yaml ## @@ -1,124 +1,237 @@ +{% set version = ARROW_VERSION %} +{% set number = "6" %} +{% set

[GitHub] [arrow] xhochy removed a comment on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy removed a comment on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647416565 > > @xhochy have you manually updated the variant files or backported/copied from the upstream repo? > > Copied. I would also

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647429536 > I'd like to treat in-kernel type promotions as an anti-pattern in general. There are upsides and downsides to it. The downside is that users of the Arrow library

[GitHub] [arrow] tianchen92 commented on pull request #7496: ARROW-7084: [C++] ArrayRangeEquals should check for full type equality?

2020-06-22 Thread GitBox
tianchen92 commented on pull request #7496: URL: https://github.com/apache/arrow/pull/7496#issuecomment-647402771 With this change, generate_non_canonical_map_case would fail, i think it is because the map field names. What is the right fix here? @pitrou

[GitHub] [arrow] projjal commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
projjal commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443434522 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d

[GitHub] [arrow] pitrou closed pull request #7512: ARROW-9204: [C++][Flight] Change records_per_stream to int64

2020-06-22 Thread GitBox
pitrou closed pull request #7512: URL: https://github.com/apache/arrow/pull/7512 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >