[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073922961 ## arrow-data/src/data.rs: ## @@ -1446,6 +1493,40 @@ impl ArrayData { }) } +/// Validates that each value in run_ends array is posittive and strictl

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073922055 ## arrow-array/src/types.rs: ## @@ -240,6 +240,17 @@ impl ArrowDictionaryKeyType for UInt32Type {} impl ArrowDictionaryKeyType for UInt64Type {} +/// A subtype of

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073920270 ## arrow-array/src/builder/primitive_ree_array_builder.rs: ## @@ -0,0 +1,218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow-datafusion] alamb merged pull request #4945: Minor: Reduce even more redundancy creating window_agg in sort_enforcement tests

2023-01-18 Thread GitBox
alamb merged PR #4945: URL: https://github.com/apache/arrow-datafusion/pull/4945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb merged pull request #4971: Add DataFusionError::Substrait variant to DataFusionError enum

2023-01-18 Thread GitBox
alamb merged PR #4971: URL: https://github.com/apache/arrow-datafusion/pull/4971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #4970: Add DataFusionError:Substrait enum variant

2023-01-18 Thread GitBox
alamb closed issue #4970: Add DataFusionError:Substrait enum variant URL: https://github.com/apache/arrow-datafusion/issues/4970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] alamb closed issue #4941: Incorrect error message when there is no valid fields

2023-01-18 Thread GitBox
alamb closed issue #4941: Incorrect error message when there is no valid fields URL: https://github.com/apache/arrow-datafusion/issues/4941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [arrow-datafusion] alamb merged pull request #4942: fix: `FieldNotFound` error message without valid fields

2023-01-18 Thread GitBox
alamb merged PR #4942: URL: https://github.com/apache/arrow-datafusion/pull/4942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073918231 ## arrow-array/src/builder/primitive_ree_array_builder.rs: ## @@ -0,0 +1,218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow-datafusion] alamb commented on pull request #4942: fix: `FieldNotFound` error message without valid fields

2023-01-18 Thread GitBox
alamb commented on PR #4942: URL: https://github.com/apache/arrow-datafusion/pull/4942#issuecomment-1387521261 Thanks @DDtKey ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow] jp0317 commented on a diff in pull request #33736: PARQUET-2232: [C++] Add an api to ColumnChunkMetaData to indicate if the column chunk uses a bloom filter

2023-01-18 Thread GitBox
jp0317 commented on code in PR #33736: URL: https://github.com/apache/arrow/pull/33736#discussion_r1073917977 ## cpp/src/parquet/metadata.h: ## @@ -171,6 +171,7 @@ class PARQUET_EXPORT ColumnChunkMetaData { const std::vector& encodings() const; const std::vector& encodin

[GitHub] [arrow-datafusion] alamb merged pull request #4916: Improve documentation for ExprVisitor, port simple uses to new walking function

2023-01-18 Thread GitBox
alamb merged PR #4916: URL: https://github.com/apache/arrow-datafusion/pull/4916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-rs] alamb commented on issue #3540: ADBC FFI types and possible abstraction

2023-01-18 Thread GitBox
alamb commented on issue #3540: URL: https://github.com/apache/arrow-rs/issues/3540#issuecomment-1387517552 I think adding adbc ffi as a crate (like `arrow-flight` would be a reasonable idea) -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073911714 ## arrow-array/src/array/mod.rs: ## @@ -579,6 +582,20 @@ pub fn make_array(data: ArrayData) -> ArrayRef { } dt => panic!("Unexpected dictionar

[GitHub] [arrow] alamb commented on pull request #33716: WIP: DO NOT MERGE: Apache Arrow Flight SQL adapter for PostgreSQL plan

2023-01-18 Thread GitBox
alamb commented on PR #33716: URL: https://github.com/apache/arrow/pull/33716#issuecomment-1387510800 It might help to start this document with an expected usecase. I originally thought it was to allow clients that spoke the postgres (front-end / back-end protocol) protocol to connect to a

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073907408 ## arrow-array/src/builder/primitive_ree_array_builder.rs: ## @@ -0,0 +1,218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073907114 ## arrow-array/src/builder/primitive_ree_array_builder.rs: ## @@ -0,0 +1,218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073906282 ## arrow-array/src/builder/primitive_ree_array_builder.rs: ## @@ -0,0 +1,218 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

[GitHub] [arrow] pitrou commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
pitrou commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073904290 ## cpp/src/arrow/array/data.cc: ## @@ -195,6 +195,7 @@ int GetNumBuffers(const DataType& type) { case Type::NA: case Type::STRUCT: case Type::FIXED_SIZE_L

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3553: feat: Add `RunEndEncodedArray`

2023-01-18 Thread GitBox
viirya commented on code in PR #3553: URL: https://github.com/apache/arrow-rs/pull/3553#discussion_r1073901528 ## arrow-array/src/array/run_end_encoded_array.rs: ## @@ -0,0 +1,518 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

[GitHub] [arrow] zhztheplayer commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
zhztheplayer commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1387500741 I see. I'll find out to what extent can allocation listener to be leveraged in this feature. Thanks for the suggestion. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] alamb commented on pull request #33716: WIP: DO NOT MERGE: Apache Arrow Flight SQL adapter for PostgreSQL plan

2023-01-18 Thread GitBox
alamb commented on PR #33716: URL: https://github.com/apache/arrow/pull/33716#issuecomment-1387498547 Thank you for starting this discussion @kou -- we are interested in this feature at InfluxData and we may have some more detail to share soon. -- This is an automated message from the A

[GitHub] [arrow] nealrichardson commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
nealrichardson commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1073891681 ## r/NEWS.md: ## @@ -19,6 +19,77 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of

[GitHub] [arrow] nealrichardson commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
nealrichardson commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1073890698 ## r/NEWS.md: ## @@ -19,6 +19,77 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of

[GitHub] [arrow] nealrichardson commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
nealrichardson commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1073888186 ## r/NEWS.md: ## @@ -19,6 +19,77 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of

[GitHub] [arrow] nealrichardson commented on pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
nealrichardson commented on PR #33748: URL: https://github.com/apache/arrow/pull/33748#issuecomment-1387485745 Thanks for doing this. A couple of overall suggestions to reduce the monotony: * There's a lot of "now" in the bullets (e.g. "can now be" over and over). "now" is implied be

[GitHub] [arrow] lidavidm commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
lidavidm commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1387485633 Ah, ok. That sounds fine. I would explore the allocation listener further, though... -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-adbc] lidavidm merged pull request #354: ci: refactor pipelines and deduplicate work

2023-01-18 Thread GitBox
lidavidm merged PR #354: URL: https://github.com/apache/arrow-adbc/pull/354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm commented on pull request #354: ci: refactor pipelines and deduplicate work

2023-01-18 Thread GitBox
lidavidm commented on PR #354: URL: https://github.com/apache/arrow-adbc/pull/354#issuecomment-1387483066 I'm going to merge this for now to start with, and we can continue extending it (e.g. in #355) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] zhztheplayer commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
zhztheplayer commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1387480855 > I don't think it needs a new interface. I meant a new implementation of the interface, e.g. `class AutoCleanBufferAllocator implements BufferAllocator` -- This is a

[GitHub] [arrow] nealrichardson commented on issue #33758: SparkR Arrow "Hello World" Error: 'write_arrow' is not an exported object from 'namespace:arrow'

2023-01-18 Thread GitBox
nealrichardson commented on issue #33758: URL: https://github.com/apache/arrow/issues/33758#issuecomment-1387478876 `write_arrow()` was deprecated in arrow 1.0.0 (July 2020) and removed in arrow 9.0.0 (https://arrow.apache.org/docs/r/news/index.html#arrow-900). (For that matter, [SparkR was

[GitHub] [arrow-ballista] andygrove merged pull request #593: Python: add method to get explain output as a string

2023-01-18 Thread GitBox
andygrove merged PR #593: URL: https://github.com/apache/arrow-ballista/pull/593 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-ballista] andygrove merged pull request #615: Refactor scheduler main

2023-01-18 Thread GitBox
andygrove merged PR #615: URL: https://github.com/apache/arrow-ballista/pull/615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-ballista] andygrove merged pull request #614: Refactor executor main

2023-01-18 Thread GitBox
andygrove merged PR #614: URL: https://github.com/apache/arrow-ballista/pull/614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] jorisvandenbossche commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387471941 Ah, no, a Buffer has a size, but this is the size in bytes, which in case of a null bitmap doesn't give you the length of the array. -- This is an automated message from

[GitHub] [arrow] lidavidm commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
lidavidm commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073869283 ## cpp/src/arrow/array/data.cc: ## @@ -195,6 +195,7 @@ int GetNumBuffers(const DataType& type) { case Type::NA: case Type::STRUCT: case Type::FIXED_SIZE

[GitHub] [arrow] wjones127 commented on issue #33605: [Python] Parquet file writes incorrect booleans on large file with default write batch size

2023-01-18 Thread GitBox
wjones127 commented on issue #33605: URL: https://github.com/apache/arrow/issues/33605#issuecomment-1387471506 We can close it 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow] ursabot commented on pull request #33647: MINOR: [C++] Remove unnecessary code in MultipathLevelBuilder::Write

2023-01-18 Thread GitBox
ursabot commented on PR #33647: URL: https://github.com/apache/arrow/pull/33647#issuecomment-1387471077 ['Python', 'R'] benchmarks have high level of regressions. [test-mac-arm](https://conbench.ursa.dev/compare/runs/79a33a1dbe9b4eb98135b6b3846a56c5...9acf60beb5324e1b84679c160f4bc7c5/)

[GitHub] [arrow] raulcd commented on issue #29743: [Dev] merge_arrow_pr.py script fails if head pointer can't be checked out

2023-01-18 Thread GitBox
raulcd commented on issue #29743: URL: https://github.com/apache/arrow/issues/29743#issuecomment-1387470601 You are correct, that should not happen anymore the `git checkout` command was removed when we moved to the GH API: https://github.com/apache/arrow/commit/01e4ad095a7649afc7a7316447bc

[GitHub] [arrow] nealrichardson commented on issue #32920: [Dev] More descriptive error output in merge script

2023-01-18 Thread GitBox
nealrichardson commented on issue #32920: URL: https://github.com/apache/arrow/issues/32920#issuecomment-1387470379 Since we're no longer using Jira, can we close this issue, or is there something more general about version checking we need? @raulcd @assignUser @rok -- This is an automat

[GitHub] [arrow-datafusion] ozankabak commented on pull request #4924: Unify Row hash and hash implementation

2023-01-18 Thread GitBox
ozankabak commented on PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#issuecomment-1387469509 #4973 looks good to me, we will help with migrating once the foundational tools are in place. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-rs] ursabot commented on pull request #3556: Expose Inner FlightServiceClient on FlightSqlServiceClient (#3551)

2023-01-18 Thread GitBox
ursabot commented on PR #3556: URL: https://github.com/apache/arrow-rs/pull/3556#issuecomment-1387468698 Benchmark runs are scheduled for baseline = 40837a87c6a7ae177298fe3fcc0e83aaf678640e and contender = 3ae1c728b266c1ba801409eb7f4b901285783e94. 3ae1c728b266c1ba801409eb7f4b901285783e94 i

[GitHub] [arrow] ursabot commented on pull request #33647: MINOR: [C++] Remove unnecessary code in MultipathLevelBuilder::Write

2023-01-18 Thread GitBox
ursabot commented on PR #33647: URL: https://github.com/apache/arrow/pull/33647#issuecomment-1387468672 Benchmark runs are scheduled for baseline = 1a8272001deb5be1053bb737493c368f659bce09 and contender = c525b57295e5ab9cb9e2591342d0b01a357660a3. c525b57295e5ab9cb9e2591342d0b01a357660a3 is

[GitHub] [arrow] droher commented on issue #33605: [Python] Parquet file writes incorrect booleans on large file with default write batch size

2023-01-18 Thread GitBox
droher commented on issue #33605: URL: https://github.com/apache/arrow/issues/33605#issuecomment-1387468379 @wjones127 I'm not sure whether I should close this or leave that to the maintainers, but from my end it looks like there's no action needed on Arrow's end - this was pretty conclusiv

[GitHub] [arrow] nealrichardson commented on issue #29743: [Dev] merge_arrow_pr.py script fails if head pointer can't be checked out

2023-01-18 Thread GitBox
nealrichardson commented on issue #29743: URL: https://github.com/apache/arrow/issues/29743#issuecomment-1387466762 @kou @raulcd @assignUser is this still valid? We use the GH API now so this shouldn't be happening. -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] paleolimbot commented on a diff in pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
paleolimbot commented on code in PR #33748: URL: https://github.com/apache/arrow/pull/33748#discussion_r1073852493 ## r/NEWS.md: ## @@ -19,6 +19,77 @@ # arrow 10.0.1.9000 +## New features + +### Docs + +* A substantial reorganisation, rewrite of and addition to, many of the

[GitHub] [arrow] AlenkaF commented on pull request #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-18 Thread GitBox
AlenkaF commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1387464145 > @AlenkaF @jorisvandenbossche would one of you like to help @akshaysu12 on this? Sure, I would love to help! Will look at the issue and the current state of the PR tomorrow and ho

[GitHub] [arrow-rs] tustvold closed issue #3551: [FlightSQL] Allow access to underlying FlightClient

2023-01-18 Thread GitBox
tustvold closed issue #3551: [FlightSQL] Allow access to underlying FlightClient URL: https://github.com/apache/arrow-rs/issues/3551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [arrow-rs] tustvold merged pull request #3556: Expose Inner FlightServiceClient on FlightSqlServiceClient (#3551)

2023-01-18 Thread GitBox
tustvold merged PR #3556: URL: https://github.com/apache/arrow-rs/pull/3556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] nealrichardson merged pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox
nealrichardson merged PR #19706: URL: https://github.com/apache/arrow/pull/19706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] 0x26res commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
0x26res commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387460124 Hmm, actually in C++ the null_bitmap is in the shape of a buffer: `std::shared_ptr null_bitmap`. Is it possible to infer the size of of the mask from a Buffer? -- This is an autom

[GitHub] [arrow] pitrou commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
pitrou commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073852207 ## cpp/src/arrow/array/data.cc: ## @@ -195,6 +195,7 @@ int GetNumBuffers(const DataType& type) { case Type::NA: case Type::STRUCT: case Type::FIXED_SIZE_L

[GitHub] [arrow-adbc] lidavidm commented on pull request #354: ci: refactor pipelines and deduplicate work

2023-01-18 Thread GitBox
lidavidm commented on PR #354: URL: https://github.com/apache/arrow-adbc/pull/354#issuecomment-1387456071 Punting on running the CGO tests on Windows since they currently fail in some way I can't reproduce -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] paleolimbot commented on issue #33094: [R] Intermittent memory leaks in the valgrind nightly test

2023-01-18 Thread GitBox
paleolimbot commented on issue #33094: URL: https://github.com/apache/arrow/issues/33094#issuecomment-1387452294 I've done some bisecting of the tests in the pursuit of a minimal reproducer here. Since it appears that the docker image used in the nightly test is the only way to reproduce th

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073843729 ## cpp/src/arrow/array/data.cc: ## @@ -195,6 +195,7 @@ int GetNumBuffers(const DataType& type) { case Type::NA: case Type::STRUCT: case Type::FIXED_SIZ

[GitHub] [arrow-adbc] lidavidm merged pull request #357: fix(go/adbc/driver/flightsql): cnxn should implement PostInitOptions

2023-01-18 Thread GitBox
lidavidm merged PR #357: URL: https://github.com/apache/arrow-adbc/pull/357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
alamb commented on code in PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958#discussion_r1073840524 ## datafusion/core/src/execution/context.rs: ## @@ -1729,6 +1741,15 @@ impl SessionState { query.statement_to_plan(statement) } +/// Creates

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073830197 ## cpp/src/arrow/array/array_encoded.h: ## @@ -0,0 +1,91 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-adbc] zeroshade opened a new pull request, #357: fix(go/driver/flightsql): cnxn should implement PostInitOptions

2023-01-18 Thread GitBox
zeroshade opened a new pull request, #357: URL: https://github.com/apache/arrow-adbc/pull/357 the Go flightsql driver connection object should implement PostInitOptions to allow setting options on it -- This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073827629 ## cpp/src/arrow/array/array_encoded.h: ## @@ -0,0 +1,91 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-ballista] adriangb commented on issue #173: Add support for Python UDFs in distributed queries

2023-01-18 Thread GitBox
adriangb commented on issue #173: URL: https://github.com/apache/arrow-ballista/issues/173#issuecomment-1387435057 > I think being able to run Python UDFs is a must, almost not even worth having Python UDF support if dependencies can't be used. This is just my opinion and not a fact.

[GitHub] [arrow-datafusion] ursabot commented on pull request #4940: Propagate planning error back to user

2023-01-18 Thread GitBox
ursabot commented on PR #4940: URL: https://github.com/apache/arrow-datafusion/pull/4940#issuecomment-1387431789 Benchmark runs are scheduled for baseline = 2fdc7b836741fa62f89e9828da65cdda98814fb1 and contender = ba9fc129b11fe08dd2be98a4cd7915d230e29488. ba9fc129b11fe08dd2be98a4cd7915d23

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073822032 ## cpp/src/arrow/array/array_encoded.h: ## @@ -0,0 +1,91 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow] jorisvandenbossche commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387425744 That sounds as an option as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073818108 ## cpp/src/arrow/array.h: ## @@ -34,10 +34,15 @@ /// @{ /// @} +/// \defgroup encoded-arrays Concrete classes for encoded arrays Review Comment: ```suggestion

[GitHub] [arrow-datafusion] saikrishna1-bidgely commented on a diff in pull request #4908: added a method to read multiple locations at the same time.

2023-01-18 Thread GitBox
saikrishna1-bidgely commented on code in PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#discussion_r1073817337 ## datafusion/core/src/execution/context.rs: ## @@ -551,12 +551,14 @@ impl SessionContext { } /// Creates a [`DataFrame`] for reading a

[GitHub] [arrow] felipecrv commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
felipecrv commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073816052 ## cpp/src/arrow/CMakeLists.txt: ## @@ -140,13 +140,16 @@ set(ARROW_SRCS array/array_binary.cc array/array_decimal.cc array/array_dict.cc +array/ar

[GitHub] [arrow-datafusion] alamb merged pull request #4940: Propagate planning error back to user

2023-01-18 Thread GitBox
alamb merged PR #4940: URL: https://github.com/apache/arrow-datafusion/pull/4940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] jorisvandenbossche commented on issue #15153: OSError: Couldn't deserialize thrift: TProtocolException

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15153: URL: https://github.com/apache/arrow/issues/15153#issuecomment-1387397032 > I am not sure where to start and what could be the root cause. Some questions that might help you to get to a reproducible example, or might give some pointers of

[GitHub] [arrow-datafusion] alamb commented on pull request #4916: Improve documentation for ExprVisitor, port simple uses to new walking function

2023-01-18 Thread GitBox
alamb commented on PR #4916: URL: https://github.com/apache/arrow-datafusion/pull/4916#issuecomment-1387395362 > Look great to me. > Sorry for review it too late. I'm really busy to working recently Thanks @jackwener -- I totally understand! -- This is an automated message

[GitHub] [arrow-rs] tustvold commented on issue #3548: Parquet Field IDs

2023-01-18 Thread GitBox
tustvold commented on issue #3548: URL: https://github.com/apache/arrow-rs/issues/3548#issuecomment-1387389302 We don't currently support this AFAIK, but would welcome PRs to add support for it -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] jorisvandenbossche commented on issue #15138: Clustered By -- how?

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15138: URL: https://github.com/apache/arrow/issues/15138#issuecomment-1387388664 Also Apache Iceberg has a "bucket[N]" transform for partitioning: https://iceberg.apache.org/spec/#partitioning -- This is an automated message from the Apache Git Servi

[GitHub] [arrow] 0x26res commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
0x26res commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387386703 Should we change it to: ``` if (children.size() == 0 && null_bitmap == nullptr) { return Status::Invalid("Can't infer struct array length with 0 child arrays");

[GitHub] [arrow] jorisvandenbossche commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387381724 Nope, the C++ method handle this yet: https://github.com/apache/arrow/blob/359f28ba9d62a5e8456d92dfbe5b16b790019edd/cpp/src/arrow/array/array_nested.cc#L530-L540 -

[GitHub] [arrow] icexelloss commented on a diff in pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-18 Thread GitBox
icexelloss commented on code in PR #33676: URL: https://github.com/apache/arrow/pull/33676#discussion_r1073794299 ## cpp/src/arrow/compute/exec/asof_join_node_test.cc: ## @@ -662,19 +662,19 @@ TRACED_TEST_P(AsofJoinBasicTest, TestBasic1, { runner(basic_test); }) -BasicTest

[GitHub] [arrow] jorisvandenbossche commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387378851 It might be worth first checking if the C++ code can handle this nowadays, otherwise handling it there sounds good (maybe a `pa.array([[]]*len(mask), pa.struct([]))` might

[GitHub] [arrow] icexelloss commented on pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-18 Thread GitBox
icexelloss commented on PR #33676: URL: https://github.com/apache/arrow/pull/33676#issuecomment-1387378034 The original ask is to make the direction of "tolerance" here: https://github.com/apache/arrow/blob/fc53ff8c5e2797c1a5a99db7f3aece80dd0b9f3e/cpp/src/arrow/compute/exec/options.h#L529

[GitHub] [arrow] pitrou commented on issue #14932: [Python] Expose streaming JSON reader

2023-01-18 Thread GitBox
pitrou commented on issue #14932: URL: https://github.com/apache/arrow/issues/14932#issuecomment-1387377337 Thanks @akshaysu12 for notifying me! I've cc'ed the relevant people on your PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
avantgardnerio commented on code in PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958#discussion_r1073789901 ## datafusion/core/src/execution/context.rs: ## @@ -1636,22 +1636,34 @@ impl SessionState { self } -/// Creates a [`LogicalPlan`] fr

[GitHub] [arrow] pitrou commented on pull request #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-18 Thread GitBox
pitrou commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1387376655 @AlenkaF @jorisvandenbossche would one of you like to help @akshaysu12 on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow] jorisvandenbossche commented on issue #15178: [Python] `Table.slice` not updating `pandas_metadata`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15178: URL: https://github.com/apache/arrow/issues/15178#issuecomment-1387374398 The pandas metadata is a quite primitive solution initially implemented to ensure correct roundtrip between pandas <-> arrow/parquet. That works for exact roundtrips, but

[GitHub] [arrow] akshaysu12 commented on issue #14932: [Python] Expose streaming JSON reader

2023-01-18 Thread GitBox
akshaysu12 commented on issue #14932: URL: https://github.com/apache/arrow/issues/14932#issuecomment-1387373796 @pitrou sorry for the delay! I added a Draft PR here: https://github.com/apache/arrow/pull/33761 It's missing documentation but I was hoping to get a look to make sure I'm

[GitHub] [arrow] 0x26res commented on issue #15109: [Python] Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
0x26res commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387374201 @jorisvandenbossche should I try to handle the mask argument in that branch of the code? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] lidavidm commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
lidavidm commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1387373744 I don't think it needs a new interface. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] github-actions[bot] commented on pull request #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1387372055 :warning: GitHub issue #14932 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33761: URL: https://github.com/apache/arrow/pull/33761#issuecomment-1387371991 * Closes: #14932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow-ballista] avantgardnerio commented on pull request #614: Refactor executor main

2023-01-18 Thread GitBox
avantgardnerio commented on PR #614: URL: https://github.com/apache/arrow-ballista/pull/614#issuecomment-1387371780 > Is this approach reasonable as a starting point to simplify doing this? Yes, I definitely think we should make this change. That being said, it can be painful ex

[GitHub] [arrow] akshaysu12 opened a new pull request, #33761: GH-14932: Add python bindings for JSON streaming reader

2023-01-18 Thread GitBox
akshaysu12 opened a new pull request, #33761: URL: https://github.com/apache/arrow/pull/33761 ### What changes are included in this PR? This PR adds a new python function open_json() that allows for opening a streaming reader to a json file. Arguments for open_json() are the same as for

[GitHub] [arrow] raulcd commented on pull request #33755: GH-33754: [CI] Install brewfile dependencies for verification task jobs on M1

2023-01-18 Thread GitBox
raulcd commented on PR #33755: URL: https://github.com/apache/arrow/pull/33755#issuecomment-1387365352 @kou this seems to fix the issue. As this is only because of our M1's setup I don't think it requires a new RC for 11.0.0 to be created. We should probably back-port it to the maintenance

[GitHub] [arrow] zhztheplayer commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
zhztheplayer commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1387359104 Thanks and would you suggest to add a new implementation of interface `BufferAllocator` (e.g. `AutoCleanBufferAllocator`)? If yes I can try start from there. > the GC does not

[GitHub] [arrow] nealrichardson commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox
nealrichardson commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073774481 ## r/src/expression.cpp: ## @@ -46,13 +46,26 @@ std::shared_ptr compute___expr__call(std::string func_name, compute::call(std::move(func_name), std::move

[GitHub] [arrow-datafusion] ursabot commented on pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
ursabot commented on PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958#issuecomment-1387356471 Benchmark runs are scheduled for baseline = 7062c2efcf43447545bd5d752ae0692a0e160a31 and contender = 2fdc7b836741fa62f89e9828da65cdda98814fb1. 2fdc7b836741fa62f89e9828da65cdda9

[GitHub] [arrow] jorisvandenbossche commented on issue #15109: Can't create a non empty StructArray with no field using `StructArray.from_array`

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15109: URL: https://github.com/apache/arrow/issues/15109#issuecomment-1387347190 We currently have a special case for this in `StructArray.from_arrays` that causes this: https://github.com/apache/arrow/blob/359f28ba9d62a5e8456d92dfbe5b16b790019e

[GitHub] [arrow-datafusion] DataPsycho opened a new issue, #4974: Is there a describe method on DataFrame like Polars?

2023-01-18 Thread GitBox
DataPsycho opened a new issue, #4974: URL: https://github.com/apache/arrow-datafusion/issues/4974 I was trying to generate a summary on the data frame but could not find any relevant methods like `describe` or `summary` which can provide [this kind](https://docs.rs/polars/latest/polars/fram

[GitHub] [arrow] jorisvandenbossche commented on issue #15105: Add compute between comparison

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #15105: URL: https://github.com/apache/arrow/issues/15105#issuecomment-1387343216 Duplicate of #25881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-datafusion] avantgardnerio closed issue #4957: Split `create_logical_plan()` into parsing and planning

2023-01-18 Thread GitBox
avantgardnerio closed issue #4957: Split `create_logical_plan()` into parsing and planning URL: https://github.com/apache/arrow-datafusion/issues/4957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] avantgardnerio merged pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
avantgardnerio merged PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
avantgardnerio commented on code in PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958#discussion_r1073760608 ## datafusion/core/src/execution/context.rs: ## @@ -1729,6 +1741,15 @@ impl SessionState { query.statement_to_plan(statement) } +///

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3556: Expose Inner FlightServiceClient on FlightSqlServiceClient (#3551)

2023-01-18 Thread GitBox
alamb commented on code in PR #3556: URL: https://github.com/apache/arrow-rs/pull/3556#discussion_r1073759794 ## arrow-flight/src/sql/client.rs: ## @@ -124,16 +122,18 @@ impl FlightSqlServiceClient { let flight_client = FlightServiceClient::new(channel); Flight

[GitHub] [arrow-rs] alamb commented on pull request #3556: Expose Inner FlightServiceClient on FlightSqlServiceClient (#3551)

2023-01-18 Thread GitBox
alamb commented on PR #3556: URL: https://github.com/apache/arrow-rs/pull/3556#issuecomment-1387340363 I also have big plans for this client over the next few releases. This is a nice step forward -- This is an automated message from the Apache Git Service. To respond to the message, plea

<    4   5   6   7   8   9   10   11   12   13   >