[GitHub] [arrow] houqp commented on pull request #7501: ARROW-9192: [Rust] run clippy to lint arrow crate in CI

2020-06-23 Thread GitBox
houqp commented on pull request #7501: URL: https://github.com/apache/arrow/pull/7501#issuecomment-648576776 @kszucs let me know if there is anything i can help to move it to the main lint workflow. This is an automated

[GitHub] [arrow] wesm commented on pull request #7501: ARROW-9192: [Rust] run clippy to lint arrow crate in CI

2020-06-23 Thread GitBox
wesm commented on pull request #7501: URL: https://github.com/apache/arrow/pull/7501#issuecomment-648562595 Hm I think this lint step should be merged into the main Lint workflow. @kszucs can you help? This is an automated

[GitHub] [arrow] wesm closed pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm closed pull request #7530: URL: https://github.com/apache/arrow/pull/7530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648561822 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] wesm closed pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
wesm closed pull request #7528: URL: https://github.com/apache/arrow/pull/7528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7528: URL: https://github.com/apache/arrow/pull/7528#issuecomment-648561165 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] hcoona commented on pull request #7493: ARROW-9183: [C++] Fix build with clang & old libstdc++.

2020-06-23 Thread GitBox
hcoona commented on pull request #7493: URL: https://github.com/apache/arrow/pull/7493#issuecomment-648542491 Got it. It seems that the gcc 4.8.5 is released later than gcc 4.9.2 & gcc 5.1. I'll take a future investigation on how to detect the missing of `std::atomic_load` overloads.

[GitHub] [arrow] sagnikc-dremio commented on a change in pull request #7402: ARROW-9099: [C++][Gandiva] Implement trim function for string

2020-06-23 Thread GitBox
sagnikc-dremio commented on a change in pull request #7402: URL: https://github.com/apache/arrow/pull/7402#discussion_r444588872 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -284,6 +284,49 @@ const char* reverse_utf8(gdv_int64 context, const char* data,

[GitHub] [arrow] kou commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-23 Thread GitBox
kou commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-648518023 I've added a workaround we already used: https://github.com/apache/arrow/pull/7449/commits/782499f8641da4a23d86125bcc812546107f2ce5 But it doesn't solve this yet. I'm

[GitHub] [arrow] wesm commented on pull request #6592: ARROW-8089: [C++] Port the toolchain build from Appveyor to Github Actions

2020-06-23 Thread GitBox
wesm commented on pull request #6592: URL: https://github.com/apache/arrow/pull/6592#issuecomment-648515003 @kszucs do you intend to keep working on this? I'll close the PR until it can be rehabilitated This is an automated

[GitHub] [arrow] wesm closed pull request #6592: ARROW-8089: [C++] Port the toolchain build from Appveyor to Github Actions

2020-06-23 Thread GitBox
wesm closed pull request #6592: URL: https://github.com/apache/arrow/pull/6592 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-06-23 Thread GitBox
wesm commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-648514762 Does this impact IPC? This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648510708 > That would be my preference. I'm OK with this. We would need to act quickly to try to pull this off for the release. I can start a DISCUSS thread and work up a patch with

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648507985 > We can decide to stipulate that union types never have non-valid values at the Union cell level, only at the child cell level. But then a union value cannot be "made null" by

[GitHub] [arrow] github-actions[bot] commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648484942 https://issues.apache.org/jira/browse/ARROW-8934 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on pull request #7530: URL: https://github.com/apache/arrow/pull/7530#issuecomment-648482567 Example use in Python: ``` In [14]: arr = pa.array(pd.date_range('2000-01-01', periods=20))

[GitHub] [arrow] wesm commented on a change in pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7530: URL: https://github.com/apache/arrow/pull/7530#discussion_r444564799 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -39,7 +40,7 @@ namespace arrow { namespace compute { template -class

[GitHub] [arrow] wesm opened a new pull request #7530: ARROW-8934: [C++] Enable `compute::Subtract` with timestamp inputs to return duration

2020-06-23 Thread GitBox
wesm opened a new pull request #7530: URL: https://github.com/apache/arrow/pull/7530 I also did a little bit of cleaning, moving some stuff into `arrow::compute::internal`. This is an automated message from the Apache Git

[GitHub] [arrow] wesm closed pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm closed pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
kszucs commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648473447 I’m going to update the bot tomorrow. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472074 https://issues.apache.org/jira/browse/ARROW-8025 This is an automated message from the Apache Git

[GitHub] [arrow] wesm opened a new pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm opened a new pull request #7529: URL: https://github.com/apache/arrow/pull/7529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #7528: ARROW-8933: [C++] Trim redundant generated code from compute/kernels/vector_hash.cc

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7528: URL: https://github.com/apache/arrow/pull/7528#issuecomment-648472075 https://issues.apache.org/jira/browse/ARROW-8933 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm commented on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken buidls This is an automated message from the Apache Git

[GitHub] [arrow] wesm edited a comment on pull request #7529: ARROW-8025: [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7529: URL: https://github.com/apache/arrow/pull/7529#issuecomment-648472135 I'll merge this ASAP to minimize the number of broken builds This is an automated message from the Apache Git

[GitHub] [arrow] wesm closed pull request #7470: ARROW-8025: [C++] Implement cast from String to Binary

2020-06-23 Thread GitBox
wesm closed pull request #7470: URL: https://github.com/apache/arrow/pull/7470 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm opened a new pull request #7528: ARROW-8933: [C++] Trim redundant generated code form vector_hash.cc

2020-06-23 Thread GitBox
wesm opened a new pull request #7528: URL: https://github.com/apache/arrow/pull/7528 Since hashing doesn't know the difference between int64, uint64, float64, or timestamp when it comes to performing its work, there's no need to generate identical compiled code for each of these logical

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648465709 Thanks, I'll take a look tonight. Hopefully this should be enough of a clue. This is an automated message

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648454941 The bug seems to be in the BitRunReader ![image](https://user-images.githubusercontent.com/329591/85470398-95a9fa00-b574-11ea-99c4-3f06db4a0179.png)

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648453423 @emkornfield I'm sort of at a dead end here, hopefully the above gives you some clues about where there might be a problem

[GitHub] [arrow] github-actions[bot] commented on pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7527: URL: https://github.com/apache/arrow/pull/7527#issuecomment-648451652 https://issues.apache.org/jira/browse/ARROW-7018 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648451899 there seems to be a situation where the bit run has more values then are needed to fulfill the call to `GetSpaced`

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444533246 ## File path: docs/source/developers/contributing.rst ## @@ -76,46 +96,83 @@ visibility. They may add a "Fix version" to indicate that they're

[GitHub] [arrow] nealrichardson commented on a change in pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7527: URL: https://github.com/apache/arrow/pull/7527#discussion_r444530279 ## File path: r/src/array_from_vector.cpp ## @@ -159,6 +159,9 @@ struct VectorToArrayConverter { if (s == NA_STRING) {

[GitHub] [arrow] nealrichardson opened a new pull request #7527: ARROW-7018: [R] Non-UTF-8 data in Arrow <--> R conversion

2020-06-23 Thread GitBox
nealrichardson opened a new pull request #7527: URL: https://github.com/apache/arrow/pull/7527 Sprinkles `Rf_translateCharUTF8` a few places. I tried to add tests for all of the different scenarios I could think of where we could have non-UTF strings. Also includes `$` and `[[`

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648446373 I'm able to reproduce the error in VS and set breakpoints, I got this far to see that GetBatchWithDictSpaced has decoded more values than it was asked to

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648439435 FTR I'm OK with dropping the top-level validity bitmap from Union, especially if it helps us move forward This is

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] wesm edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] wesm commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
wesm commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648435911 > @wesm why would we have validity at both the top level and the inner level? Well, the way the specification is written * _All_ nested types including union are

[GitHub] [arrow] jacques-n commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be uncha

2020-06-23 Thread GitBox
jacques-n commented on a change in pull request #6402: URL: https://github.com/apache/arrow/pull/6402#discussion_r444514257 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java ## @@ -751,55 +757,57 @@ private void

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648428724 Adding to my previous comments: if only at the top level, I'm not sure what the ramification of that would mean at the Java codebase. I think it would require a fairly massive

[GitHub] [arrow] bkietz commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r444509707 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; } -Result

[GitHub] [arrow] jacques-n commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even means

[GitHub] [arrow] jacques-n edited a comment on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-23 Thread GitBox
jacques-n edited a comment on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-648427018 I'm really struggling with these changes. I don't understand why there is a validity buffer at the union level as well as at the cell level. I'm not sure what it even

[GitHub] [arrow] wesm closed pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm closed pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410864 +1. We can work on performance smithing in follow up PRs This is an automated message from the Apache Git Service.

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648410615 I looked at the Parquet read/write benchmarks, the differences look like mostly noise to me ``` benchmark baselinecontender

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
fsaintjacques commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r92049 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; }

[GitHub] [arrow] github-actions[bot] commented on pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7526: URL: https://github.com/apache/arrow/pull/7526#issuecomment-648401641 https://issues.apache.org/jira/browse/ARROW-9146 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7526: URL: https://github.com/apache/arrow/pull/7526#discussion_r86902 ## File path: cpp/src/arrow/dataset/file_parquet.cc ## @@ -357,13 +355,20 @@ static inline Result> AugmentRowGroups( return row_groups; }

[GitHub] [arrow] bkietz opened a new pull request #7526: ARROW-9146: [C++][Dataset] Lazily store fragment physical schema

2020-06-23 Thread GitBox
bkietz opened a new pull request #7526: URL: https://github.com/apache/arrow/pull/7526 The physical schema is required to validate predicates used for filtering row groups based on statistics. It can also be explicitly provided to ensure that if no row groups satisfy the predicate

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] maxburke commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
maxburke commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r70083 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] paddyhoran closed pull request #7466: ARROW-9158: [Rust][Datafusion] projection physical plan compilation should preserve nullability

2020-06-23 Thread GitBox
paddyhoran closed pull request #7466: URL: https://github.com/apache/arrow/pull/7466 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] paddyhoran commented on a change in pull request #7500: ARROW-9191: [Rust] Do not panic when milliseconds is less than zero as chrono can handle…

2020-06-23 Thread GitBox
paddyhoran commented on a change in pull request #7500: URL: https://github.com/apache/arrow/pull/7500#discussion_r43777 ## File path: rust/parquet/src/record/api.rs ## @@ -893,16 +893,6 @@ mod tests { assert_eq!(row, Field::TimestampMillis(123854406)); }

[GitHub] [arrow] wesm closed pull request #7518: ARROW-9138: [Docs][Format] Make sure format version is hard coded in the docs

2020-06-23 Thread GitBox
wesm closed pull request #7518: URL: https://github.com/apache/arrow/pull/7518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] [WIP] Create test to receive RecordBatch for different endian

2020-06-23 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-648320579 Are there any comments about this approach for preparing test cases between different endians? cc @pitrou @wesm If not, I will prepare other tests (but disabled now) with this

[GitHub] [arrow] wesm edited a comment on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm edited a comment on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the initial visitor_inline.h changes gcc-8: ``` benchmark baseline

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648279829 Here's the sort benchmarks prior to the visitor_inline.h changes gcc-8: ``` benchmark baseline

[GitHub] [arrow] bkietz closed pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-23 Thread GitBox
bkietz closed pull request #7513: URL: https://github.com/apache/arrow/pull/7513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648270948 OK I'm done twiddling this, here is the latest comparison of the hash benchmarks versus master with gcc-8: ``` benchmark baseline

[GitHub] [arrow] rjzamora edited a comment on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora edited a comment on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I

[GitHub] [arrow] rjzamora commented on pull request #7523: ARROW-8733: [Python][Dataset] Expose statistics of ParquetFileFragment::RowGroupInfo

2020-06-23 Thread GitBox
rjzamora commented on pull request #7523: URL: https://github.com/apache/arrow/pull/7523#issuecomment-648269136 Thanks for working on this @jorisvandenbossche ! This does seem like the functionality needed by Dask. To test my understanding (and for the sake of discussion), I am

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444333447 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444330998 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] bkietz commented on pull request #7493: ARROW-9183: [C++] Fix build with clang & old libstdc++.

2020-06-23 Thread GitBox
bkietz commented on pull request #7493: URL: https://github.com/apache/arrow/pull/7493#issuecomment-648252136 Hmm, there's a failure building with GCC 4.8 https://github.com/apache/arrow/pull/7493/checks?check_run_id=791725319#step:9:534 The `#ifdef` condition seems to be failing to

[GitHub] [arrow] alippai commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
alippai commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648247832 Thanks, now I understand. So the pairing with toxiproxy is for the testing :)) That's what you wrote, I just misunderstood

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444322251 ## File path: docs/source/developers/contributing.rst ## @@ -76,46 +96,83 @@ visibility. They may add a "Fix version" to indicate that they're

[GitHub] [arrow] fsaintjacques commented on pull request #7517: ARROW-1682: [Doc] Expand S3/MinIO fileystem dataset documentation

2020-06-23 Thread GitBox
fsaintjacques commented on pull request #7517: URL: https://github.com/apache/arrow/pull/7517#issuecomment-648244980 I can't comment on the production quality of MinIO since I've never used it in such scenario. I meant this for reference to other developers who wants to test the S3

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444320449 ## File path: docs/source/developers/contributing.rst ## @@ -168,11 +274,15 @@ remote repo still holds the old history, you would need to do a

[GitHub] [arrow] nealrichardson commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444318497 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444311774 ## File path: r/R/table.R ## @@ -202,7 +210,27 @@ Table$create <- function(..., schema = NULL) { #' @export as.data.frame.Table <- function(x,

[GitHub] [arrow] github-actions[bot] commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648240180 https://issues.apache.org/jira/browse/ARROW-9214 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648238512 Here are some vector-hash benchmarks comparing this branch with master. The performance "regressions" are for the 99%-100% null cases, I'll take a quick look at these in the

[GitHub] [arrow] wesm commented on pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm commented on pull request #7525: URL: https://github.com/apache/arrow/pull/7525#issuecomment-648235922 Here's what I see in the sort benchmarks with this patch compared with 7ed698b94, the patch right before the visitor_inline.h changes ```

[GitHub] [arrow] wesm opened a new pull request #7525: ARROW-9214: [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline

2020-06-23 Thread GitBox
wesm opened a new pull request #7525: URL: https://github.com/apache/arrow/pull/7525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444308097 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] nealrichardson commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444306795 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-23 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-648230676 I'm not sure what the MSVC failure is about but I'll debug locally This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444302116 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444295036 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444293158 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] lionel- commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
lionel- commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444292367 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444288120 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] wesm commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444285972 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem abstraction

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444283172 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] romainfrancois commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r444281970 ## File path: r/src/array_from_vector.cpp ## @@ -1067,12 +1110,22 @@ std::shared_ptr InferArrowTypeFromVector(SEXP x) { if (Rf_inherits(x,

[GitHub] [arrow] romainfrancois commented on a change in pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois commented on a change in pull request #7524: URL: https://github.com/apache/arrow/pull/7524#discussion_r444273703 ## File path: r/tests/testthat/test-Table.R ## @@ -334,5 +334,5 @@ test_that("Table metadata", { test_that("Table handles null type

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7520: ARROW-9189: [Website] Improve contributor guide

2020-06-23 Thread GitBox
jorisvandenbossche commented on a change in pull request #7520: URL: https://github.com/apache/arrow/pull/7520#discussion_r444268553 ## File path: docs/source/developers/contributing.rst ## @@ -124,29 +181,72 @@ To contribute a patch: `ARROW-767: [C++] Filesystem

[GitHub] [arrow] github-actions[bot] commented on pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
github-actions[bot] commented on pull request #7524: URL: https://github.com/apache/arrow/pull/7524#issuecomment-648198565 https://issues.apache.org/jira/browse/ARROW-8899 This is an automated message from the Apache Git

[GitHub] [arrow] romainfrancois opened a new pull request #7524: ARROW-8899 [R] Add R metadata like pandas metadata for round-trip fidelity

2020-06-23 Thread GitBox
romainfrancois opened a new pull request #7524: URL: https://github.com/apache/arrow/pull/7524 ``` r library(arrow, warn.conflicts = FALSE) tab <- Table$create( a = structure(1:4, foo = "bar"), b = haven::labelled(1:4, label = "description") ) tab$metadata$r #>

[GitHub] [arrow] wesm commented on a change in pull request #7321: ARROW-8985: [Format][DONOTMERGE] RFC Proposed Decimal::byteWidth field for forward compatibility

2020-06-23 Thread GitBox
wesm commented on a change in pull request #7321: URL: https://github.com/apache/arrow/pull/7321#discussion_r444249804 ## File path: format/Schema.fbs ## @@ -134,11 +134,20 @@ table FixedSizeBinary { table Bool { } +/// Exact decimal value represented as an integer value

[GitHub] [arrow] jorisvandenbossche commented on pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7395: URL: https://github.com/apache/arrow/pull/7395#issuecomment-648165633 More comments on this? (apart from ensuring the tests pass) I should probably still add it to the filesystem docs.

[GitHub] [arrow] wesm closed pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm closed pull request #7516: URL: https://github.com/apache/arrow/pull/7516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-23 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-648162680 +1. The bot changes can't be done here so going to go ahead and merge this so I can use it more easily without having to switch branches (to use this branch) before running

[GitHub] [arrow] wesm closed pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
wesm closed pull request #7522: URL: https://github.com/apache/arrow/pull/7522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm commented on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h

2020-06-23 Thread GitBox
wesm commented on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-648147535 thanks @pitrou and @cyb70289 -- I will spend a little time on the count-sort implementation and post a new patch

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche edited a comment on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed, and the code looks good to me

[GitHub] [arrow] jorisvandenbossche commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

2020-06-23 Thread GitBox
jorisvandenbossche commented on pull request #7522: URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247 Was just testing it, and can also confirm the case from the issue is fixed This is an automated

  1   2   >