[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-23 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095 ## File path: cpp/src/arrow/sparse_tensor_test.cc ## @@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const std::shared_ptr& sidx, const int } }

[GitHub] [arrow] cyb70289 edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
cyb70289 edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768 > I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort

[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-23 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443990095 ## File path: cpp/src/arrow/sparse_tensor_test.cc ## @@ -49,7 +49,10 @@ static inline void AssertCOOIndex(const std::shared_ptr& sidx, const int } }

[GitHub] [arrow] praveenbingo closed pull request #7495: ARROW-9185: [Java][Gandiva] Make llvm build optimisation configurable from java

2020-06-23 Thread GitBox
praveenbingo closed pull request #7495: URL: https://github.com/apache/arrow/pull/7495 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-23 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443982716 ## File path: python/pyarrow/tensor.pxi ## @@ -339,6 +350,15 @@ shape: {0.shape}""".format(self) def non_zero_length(self): return

[GitHub] [arrow] pitrou commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-23 Thread GitBox
pitrou commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-648106332 > Can these comments give you an understanding? No, they don't. They don't explain _why_ the flag is useful. What does it bring to know that the indices are canonical? The

[GitHub] [arrow] pitrou commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
pitrou commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648113053 Perhaps @jorisvandenbossche can review this, because I don't much about Pandas conversions and internals. This is

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-648147535 thanks @pitrou and @cyb70289 -- I will spend a little time on the count-sort implementation and post a new patch

[GitHub] [arrow] wesm closed pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
wesm closed pull request #7522: URL: https://github.com/apache/arrow/pull/7522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
wesm commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648145200 +1, I'll go ahead and merge this since I confirmed the memory leak is fixed This is an automated message from the

[GitHub] [arrow] jorisvandenbossche commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed This is an automated

[GitHub] [arrow] jorisvandenbossche commented on pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7395: URL: https://github.com/apache/arrow/pull/7395#issuecomment-648165633 More comments on this? (apart from ensuring the tests pass) I should probably still add it to the filesystem docs.

[GitHub] [arrow] wesm commented on a change in pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7321: URL: https://github.com/apache/arrow/pull/7321#discussion_r444249804 ## File path: format/Schema.fbs ## @@ -134,11 +134,20 @@ table FixedSizeBinary { table Bool { } +/// Exact decimal value represented as an integer value

[GitHub] [arrow] pitrou closed pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
pitrou closed pull request #7521: URL: https://github.com/apache/arrow/pull/7521 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm closed pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm closed pull request #7516: URL: https://github.com/apache/arrow/pull/7516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] romainfrancois opened a new pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois opened a new pull request #7524: URL: https://github.com/apache/arrow/pull/7524 ``` r library(arrow, warn.conflicts = FALSE) tab <- Table$create( a = structure(1:4, foo = "bar"), b = haven::labelled(1:4, label = "description") ) tab$metadata$r #>

[GitHub] [arrow] github-actions[bot] commented on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648087751 https://issues.apache.org/jira/browse/ARROW-8733 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche edited a comment on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed, and the code looks good to me

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648162680 +1. The bot changes can't be done here so going to go ahead and merge this so I can use it more easily without having to switch branches (to use this branch) before running

[GitHub] [arrow] pitrou commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
pitrou commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-648019411 Let's leave sorting optimizations for another PR. I'll review this one. This is an automated message from the

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
jorisvandenbossche opened a new pull request #7523: URL: https://github.com/apache/arrow/pull/7523 Not a polished PR, just a quick try (in cython, since that's faster for me) to expose the RowGroupInfo statistics in Python + convert the expression into min/max information. More as food

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444295036 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444302116 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] wesm opened a new pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm opened a new pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444308097 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] rjzamora edited a comment on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora edited a comment on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I

[GitHub] [arrow] wesm edited a comment on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the initial visitor_inline.h changes gcc-8: ``` benchmark baseline

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444283172 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444281970 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444285972 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444293158 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444306795 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648238512 Here are some vector-hash benchmarks comparing this branch with master. The performance "regressions" are for the 99%-100% null cases, I'll take a quick look at these in the

[GitHub] [arrow] fsaintjacques commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
fsaintjacques commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648244980 I can't comment on the production quality of MinIO since I've never used it in such scenario. I meant this for reference to other developers who wants to test the S3

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444333447 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] rjzamora commented on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora commented on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I am

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444288120 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648230676 I'm not sure what the MSVC failure is about but I'll debug locally This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444311774 ## File path: r/R/table.R ## @@ -202,7 +210,27 @@ Table$create <- function(..., schema = NULL) { #' @export as.data.frame.Table <- function(x,

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444322251 ## File path: docs/source/developers/contributing.rst ## @@ -76,46 +96,83 @@ visibility. They may add a "Fix version" to indicate that they're

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444330998 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] bkietz closed pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-23 Thread GitBox
bkietz closed pull request #7513: URL: https://github.com/apache/arrow/pull/7513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-648198565 https://issues.apache.org/jira/browse/ARROW-8899 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444268553 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] romainfrancois commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444273703 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type

[GitHub] [arrow] lionel- commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
lionel- commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444292367 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444320449 ## File path: docs/source/developers/contributing.rst ## @@ -168,11 +274,15 @@ remote repo still holds the old history, you would need to do a

[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
alippai commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648247832 Thanks, now I understand. So the pairing with toxiproxy is for the testing :)) That's what you wrote, I just misunderstood

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648270948 OK I'm done twiddling this, here is the latest comparison of the hash benchmarks versus master with gcc-8: ``` benchmark baseline

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648235922 Here's what I see in the sort benchmarks with this patch compared with 7ed698b94, the patch right before the visitor_inline.h changes ```

[GitHub] [arrow] github-actions[bot] commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648240180 https://issues.apache.org/jira/browse/ARROW-9214 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444318497 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] bkietz commented on pull request #7493: ARROW-9183: [C++] Fix build with clang & old libstdc++.

2020-06-23 Thread GitBox
bkietz commented on pull request #7493: URL: https://github.com/apache/arrow/pull/7493#issuecomment-648252136 Hmm, there's a failure building with GCC 4.8 https://github.com/apache/arrow/pull/7493/checks?check_run_id=791725319#step:9:534 The `#ifdef` condition seems to be failing to

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the visitor_inline.h changes gcc-8: ``` benchmark baseline

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-23 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-648320579 Are there any comments about this approach for preparing test cases between different endians? cc @pitrou @wesm If not, I will prepare other tests (but disabled now) with this

[GitHub] [arrow] paddyhoran commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
paddyhoran commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r43777 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] paddyhoran closed pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-23 Thread GitBox
paddyhoran closed pull request #7466: URL: https://github.com/apache/arrow/pull/7466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm closed pull request #7518: ARROW-9138: [Docs][Format] Make sure format version is hard coded in the docs

2020-06-23 Thread GitBox
wesm closed pull request #7518: URL: https://github.com/apache/arrow/pull/7518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
kszucs commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648473447 I’m going to update the bot tomorrow. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] wesm closed pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm closed pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on a change in pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7530: URL: https://github.com/apache/arrow/pull/7530#discussion_r444564799 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -39,7 +40,7 @@ namespace arrow { namespace compute { template -class

[GitHub] [arrow] wesm opened a new pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm opened a new pull request #7530: URL: https://github.com/apache/arrow/pull/7530 I also did a little bit of cleaning, moving some stuff into `arrow::compute::internal`. This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648482567 Example use in Python: ``` In [14]: arr = pa.array(pd.date_range('2000-01-01', periods=20))

[GitHub] [arrow] github-actions[bot] commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648484942 https://issues.apache.org/jira/browse/ARROW-8934 This is an automated message from the Apache Git

[GitHub] [arrow] bkietz opened a new pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz opened a new pull request #7526: URL: https://github.com/apache/arrow/pull/7526 The physical schema is required to validate predicates used for filtering row groups based on statistics. It can also be explicitly provided to ensure that if no row groups satisfy the predicate

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410615 I looked at the Parquet read/write benchmarks, the differences look like mostly noise to me ``` benchmark baselinecontender

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410864 +1. We can work on performance smithing in follow up PRs This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm closed pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm closed pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r86902 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; }

[GitHub] [arrow] github-actions[bot] commented on pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7526: URL: https://github.com/apache/arrow/pull/7526#issuecomment-648401641 https://issues.apache.org/jira/browse/ARROW-9146 This is an automated message from the Apache Git

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
fsaintjacques commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r92049 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; }

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even means

[GitHub] [arrow] bkietz commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r444509707 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Result

[GitHub] [arrow] wesm closed pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
wesm closed pull request #7528: URL: https://github.com/apache/arrow/pull/7528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7528: URL: https://github.com/apache/arrow/pull/7528#issuecomment-648561165 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648561822 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm closed pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm closed pull request #7530: URL: https://github.com/apache/arrow/pull/7530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7501: ARROW-9192: [Rust] run clippy to lint arrow crate in CI

2020-06-23 Thread GitBox
wesm commented on pull request #7501: URL: https://github.com/apache/arrow/pull/7501#issuecomment-648562595 Hm I think this lint step should be merged into the main Lint workflow. @kszucs can you help? This is an automated

[GitHub] [arrow] houqp commented on pull request #7501: ARROW-9192: [Rust] run clippy to lint arrow crate in CI

2020-06-23 Thread GitBox
houqp commented on pull request #7501: URL: https://github.com/apache/arrow/pull/7501#issuecomment-648576776 @kszucs let me know if there is anything i can help to move it to the main lint workflow. This is an automated

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648428724 Adding to my previous comments: if only at the top level, I'm not sure what the ramification of that would mean at the Java codebase. I think it would require a fairly massive

[GitHub] [arrow] jacques-n edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even

[GitHub] [arrow] jacques-n commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be uncha

2020-06-23 Thread GitBox
jacques-n commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r444514257 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648439435 FTR I'm OK with dropping the top-level validity bitmap from Union, especially if it helps us move forward This is

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648446373 I'm able to reproduce the error in VS and set breakpoints, I got this far to see that GetBatchWithDictSpaced has decoded more values than it was asked to

[GitHub] [arrow] nealrichardson opened a new pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson opened a new pull request #7527: URL: https://github.com/apache/arrow/pull/7527 Sprinkles `Rf_translateCharUTF8` a few places. I tried to add tests for all of the different scenarios I could think of where we could have non-UTF strings. Also includes `$` and `[[`

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648451899 there seems to be a situation where the bit run has more values then are needed to fulfill the call to `GetSpaced`

[GitHub] [arrow] github-actions[bot] commented on pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7527: URL: https://github.com/apache/arrow/pull/7527#issuecomment-648451652 https://issues.apache.org/jira/browse/ARROW-7018 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648453423 @emkornfield I'm sort of at a dead end here, hopefully the above gives you some clues about where there might be a problem

[GitHub] [arrow] wesm closed pull request #7470: ARROW-8025: [C++] Implement cast from String to Binary

2020-06-23 Thread GitBox
wesm closed pull request #7470: URL: https://github.com/apache/arrow/pull/7470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7528: URL: https://github.com/apache/arrow/pull/7528#issuecomment-648472075 https://issues.apache.org/jira/browse/ARROW-8933 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken buidls This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472074 https://issues.apache.org/jira/browse/ARROW-8025 This is an automated message from the Apache Git

[GitHub] [arrow] wesm opened a new pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm opened a new pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] wesm edited a comment on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken builds This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] nealrichardson commented on a change in pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7527: URL: https://github.com/apache/arrow/pull/7527#discussion_r444530279 ## File path: r/src/array_from_vector.cpp ## @@ -159,6 +159,9 @@ struct VectorToArrayConverter { if (s == NA_STRING) {

  1   2   >