[GitHub] [arrow] cyb70289 edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
cyb70289 edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647914768 > I'm refactoring to nix util::optional. I'm too tired to finish it tonight so I'll work on it tomorrow morning. If the perf regression isn't gone I'll rewrite the sort ke

[GitHub] [arrow] praveenbingo closed pull request #7495: ARROW-9185: [Java][Gandiva] Make llvm build optimisation configurable from java

2020-06-23 Thread GitBox
praveenbingo closed pull request #7495: URL: https://github.com/apache/arrow/pull/7495 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
pitrou commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-648019411 Let's leave sorting optimizations for another PR. I'll review this one. This is an automated message from the Apach

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
jorisvandenbossche opened a new pull request #7523: URL: https://github.com/apache/arrow/pull/7523 Not a polished PR, just a quick try (in cython, since that's faster for me) to expose the RowGroupInfo statistics in Python + convert the expression into min/max information. More as food for

[GitHub] [arrow] github-actions[bot] commented on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648087751 https://issues.apache.org/jira/browse/ARROW-8733 This is an automated message from the Apache Git Serv

[GitHub] [arrow] pitrou commented on pull request #7477: ARROW-4221: [C++][Python] Add canonical flag in COO sparse index

2020-06-23 Thread GitBox
pitrou commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-648106332 > Can these comments give you an understanding? No, they don't. They don't explain _why_ the flag is useful. What does it bring to know that the indices are canonical? The PR

[GitHub] [arrow] pitrou commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
pitrou commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648113053 Perhaps @jorisvandenbossche can review this, because I don't much about Pandas conversions and internals. This is

[GitHub] [arrow] pitrou closed pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
pitrou closed pull request #7521: URL: https://github.com/apache/arrow/pull/7521 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
wesm commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648145200 +1, I'll go ahead and merge this since I confirmed the memory leak is fixed This is an automated message from the Apa

[GitHub] [arrow] jorisvandenbossche commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed This is an automated message

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche edited a comment on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed, and the code looks good to me --

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-648147535 thanks @pitrou and @cyb70289 -- I will spend a little time on the count-sort implementation and post a new patch Thi

[GitHub] [arrow] wesm closed pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
wesm closed pull request #7522: URL: https://github.com/apache/arrow/pull/7522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648162680 +1. The bot changes can't be done here so going to go ahead and merge this so I can use it more easily without having to switch branches (to use this branch) before running benchmark

[GitHub] [arrow] wesm closed pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm closed pull request #7516: URL: https://github.com/apache/arrow/pull/7516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] jorisvandenbossche commented on pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7395: URL: https://github.com/apache/arrow/pull/7395#issuecomment-648165633 More comments on this? (apart from ensuring the tests pass) I should probably still add it to the filesystem docs. --

[GitHub] [arrow] wesm commented on a change in pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7321: URL: https://github.com/apache/arrow/pull/7321#discussion_r444249804 ## File path: format/Schema.fbs ## @@ -134,11 +134,20 @@ table FixedSizeBinary { table Bool { } +/// Exact decimal value represented as an integer value in

[GitHub] [arrow] romainfrancois opened a new pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois opened a new pull request #7524: URL: https://github.com/apache/arrow/pull/7524 ``` r library(arrow, warn.conflicts = FALSE) tab <- Table$create( a = structure(1:4, foo = "bar"), b = haven::labelled(1:4, label = "description") ) tab$metadata$r #>

[GitHub] [arrow] github-actions[bot] commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-648198565 https://issues.apache.org/jira/browse/ARROW-8899 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444268553 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] romainfrancois commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444273703 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type (ARROW-7064)",

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444281970 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "data.

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444283172 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "data.

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444285972 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444288120 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] lionel- commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
lionel- commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444292367 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "data.frame")

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444293158 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444295036 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444302116 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "data.

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648230676 I'm not sure what the MSVC failure is about but I'll debug locally This is an automated message from the Apache Git S

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444306795 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type (ARROW-7064)",

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444308097 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x, "data.

[GitHub] [arrow] wesm opened a new pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm opened a new pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648235922 Here's what I see in the sort benchmarks with this patch compared with 7ed698b94, the patch right before the visitor_inline.h changes ```

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648238512 Here are some vector-hash benchmarks comparing this branch with master. The performance "regressions" are for the 99%-100% null cases, I'll take a quick look at these in the implemen

[GitHub] [arrow] github-actions[bot] commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648240180 https://issues.apache.org/jira/browse/ARROW-9214 This is an automated message from the Apache Git Serv

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444311774 ## File path: r/R/table.R ## @@ -202,7 +210,27 @@ Table$create <- function(..., schema = NULL) { #' @export as.data.frame.Table <- function(x, ro

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444318497 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444320449 ## File path: docs/source/developers/contributing.rst ## @@ -168,11 +274,15 @@ remote repo still holds the old history, you would need to do a force

[GitHub] [arrow] fsaintjacques commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
fsaintjacques commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648244980 I can't comment on the production quality of MinIO since I've never used it in such scenario. I meant this for reference to other developers who wants to test the S3 binding

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444322251 ## File path: docs/source/developers/contributing.rst ## @@ -76,46 +96,83 @@ visibility. They may add a "Fix version" to indicate that they're consi

[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
alippai commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648247832 Thanks, now I understand. So the pairing with toxiproxy is for the testing :)) That's what you wrote, I just misunderstood

[GitHub] [arrow] bkietz commented on pull request #7493: ARROW-9183: [C++] Fix build with clang & old libstdc++.

2020-06-23 Thread GitBox
bkietz commented on pull request #7493: URL: https://github.com/apache/arrow/pull/7493#issuecomment-648252136 Hmm, there's a failure building with GCC 4.8 https://github.com/apache/arrow/pull/7493/checks?check_run_id=791725319#step:9:534 The `#ifdef` condition seems to be failing to dete

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444330998 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444333447 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] rjzamora commented on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora commented on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I am imag

[GitHub] [arrow] rjzamora edited a comment on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora edited a comment on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648270948 OK I'm done twiddling this, here is the latest comparison of the hash benchmarks versus master with gcc-8: ``` benchmark baselinecontender

[GitHub] [arrow] bkietz closed pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-23 Thread GitBox
bkietz closed pull request #7513: URL: https://github.com/apache/arrow/pull/7513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the visitor_inline.h changes gcc-8: ``` benchmark baseline contender

[GitHub] [arrow] wesm edited a comment on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the initial visitor_inline.h changes gcc-8: ``` benchmark baseline

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-23 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-648320579 Are there any comments about this approach for preparing test cases between different endians? cc @pitrou @wesm If not, I will prepare other tests (but disabled now) with this a

[GitHub] [arrow] wesm closed pull request #7518: ARROW-9138: [Docs][Format] Make sure format version is hard coded in the docs

2020-06-23 Thread GitBox
wesm closed pull request #7518: URL: https://github.com/apache/arrow/pull/7518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] paddyhoran commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
paddyhoran commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r43777 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] paddyhoran closed pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-23 Thread GitBox
paddyhoran closed pull request #7466: URL: https://github.com/apache/arrow/pull/7466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] bkietz opened a new pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz opened a new pull request #7526: URL: https://github.com/apache/arrow/pull/7526 The physical schema is required to validate predicates used for filtering row groups based on statistics. It can also be explicitly provided to ensure that if no row groups satisfy the predicate n

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r86902 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; }

[GitHub] [arrow] github-actions[bot] commented on pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7526: URL: https://github.com/apache/arrow/pull/7526#issuecomment-648401641 https://issues.apache.org/jira/browse/ARROW-9146 This is an automated message from the Apache Git Serv

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
fsaintjacques commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r92049 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Resu

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410615 I looked at the Parquet read/write benchmarks, the differences look like mostly noise to me ``` benchmark baselinecontender

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410864 +1. We can work on performance smithing in follow up PRs This is an automated message from the Apache Git Service. To

[GitHub] [arrow] wesm closed pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm closed pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even means that

[GitHub] [arrow] bkietz commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r444509707 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Result Parq

[GitHub] [arrow] jacques-n edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even mea

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648428724 Adding to my previous comments: if only at the top level, I'm not sure what the ramification of that would mean at the Java codebase. I think it would require a fairly massive r

[GitHub] [arrow] jacques-n commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be uncha

2020-06-23 Thread GitBox
jacques-n commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r444514257 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void splitAndTransferOffset

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are composed

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are c

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are c

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648439435 FTR I'm OK with dropping the top-level validity bitmap from Union, especially if it helps us move forward This is an

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648446373 I'm able to reproduce the error in VS and set breakpoints, I got this far to see that GetBatchWithDictSpaced has decoded more values than it was asked to ![image](https://user

[GitHub] [arrow] nealrichardson opened a new pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson opened a new pull request #7527: URL: https://github.com/apache/arrow/pull/7527 Sprinkles `Rf_translateCharUTF8` a few places. I tried to add tests for all of the different scenarios I could think of where we could have non-UTF strings. Also includes `$` and `[[` metho

[GitHub] [arrow] nealrichardson commented on a change in pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7527: URL: https://github.com/apache/arrow/pull/7527#discussion_r444530279 ## File path: r/src/array_from_vector.cpp ## @@ -159,6 +159,9 @@ struct VectorToArrayConverter { if (s == NA_STRING) { RETURN_NOT_OK(

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444533246 ## File path: docs/source/developers/contributing.rst ## @@ -76,46 +96,83 @@ visibility. They may add a "Fix version" to indicate that they're consi

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648451899 there seems to be a situation where the bit run has more values then are needed to fulfill the call to `GetSpaced` ![image](https://user-images.githubusercontent.com/329591/85

[GitHub] [arrow] github-actions[bot] commented on pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7527: URL: https://github.com/apache/arrow/pull/7527#issuecomment-648451652 https://issues.apache.org/jira/browse/ARROW-7018 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648453423 @emkornfield I'm sort of at a dead end here, hopefully the above gives you some clues about where there might be a problem --

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648454941 The bug seems to be in the BitRunReader ![image](https://user-images.githubusercontent.com/329591/85470398-95a9fa00-b574-11ea-99c4-3f06db4a0179.png) -

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648465709 Thanks, I'll take a look tonight. Hopefully this should be enough of a clue. This is an automated message fro

[GitHub] [arrow] wesm opened a new pull request #7528: ARROW-8933: [C++] Trim redundant generated code form vector_hash.cc

2020-06-23 Thread GitBox
wesm opened a new pull request #7528: URL: https://github.com/apache/arrow/pull/7528 Since hashing doesn't know the difference between int64, uint64, float64, or timestamp when it comes to performing its work, there's no need to generate identical compiled code for each of these logical ty

[GitHub] [arrow] wesm closed pull request #7470: ARROW-8025: [C++] Implement cast from String to Binary

2020-06-23 Thread GitBox
wesm closed pull request #7470: URL: https://github.com/apache/arrow/pull/7470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] github-actions[bot] commented on pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7528: URL: https://github.com/apache/arrow/pull/7528#issuecomment-648472075 https://issues.apache.org/jira/browse/ARROW-8933 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken buidls This is an automated message from the Apache Git Service

[GitHub] [arrow] github-actions[bot] commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472074 https://issues.apache.org/jira/browse/ARROW-8025 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm opened a new pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm opened a new pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm edited a comment on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken builds This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
kszucs commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648473447 I’m going to update the bot tomorrow. This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow] wesm commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648477046 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] wesm closed pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm closed pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] wesm opened a new pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm opened a new pull request #7530: URL: https://github.com/apache/arrow/pull/7530 I also did a little bit of cleaning, moving some stuff into `arrow::compute::internal`. This is an automated message from the Apache Git S

[GitHub] [arrow] wesm commented on a change in pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7530: URL: https://github.com/apache/arrow/pull/7530#discussion_r444564799 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -39,7 +40,7 @@ namespace arrow { namespace compute { template -class TestBinar

[GitHub] [arrow] wesm commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648482567 Example use in Python: ``` In [14]: arr = pa.array(pd.date_range('2000-01-01', periods=20))

[GitHub] [arrow] github-actions[bot] commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648484942 https://issues.apache.org/jira/browse/ARROW-8934 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648507985 > We can decide to stipulate that union types never have non-valid values at the Union cell level, only at the child cell level. But then a union value cannot be "made null" by

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648510708 > That would be my preference. I'm OK with this. We would need to act quickly to try to pull this off for the release. I can start a DISCUSS thread and work up a patch with the

[GitHub] [arrow] wesm commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-23 Thread GitBox
wesm commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-648514762 Does this impact IPC? This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] wesm commented on pull request #6592: ARROW-8089: [C++] Port the toolchain build from Appveyor to Github Actions

2020-06-23 Thread GitBox
wesm commented on pull request #6592: URL: https://github.com/apache/arrow/pull/6592#issuecomment-648515003 @kszucs do you intend to keep working on this? I'll close the PR until it can be rehabilitated This is an automated

  1   2   >