Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-17 Thread via GitHub
kou commented on code in PR #41180: URL: https://github.com/apache/arrow/pull/41180#discussion_r1568301645 ## docs/source/format/DissociatedIPC.rst: ## @@ -0,0 +1,335 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements.

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-17 Thread via GitHub
kou commented on code in PR #41180: URL: https://github.com/apache/arrow/pull/41180#discussion_r1568300130 ## docs/source/format/DissociatedIPC.rst: ## @@ -0,0 +1,335 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements.

Re: [I] [Java] EPL Dependencies [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on issue #40896: URL: https://github.com/apache/arrow/issues/40896#issuecomment-2060528094 @lidavidm I followed the thread. I think I should be able to, but probably next week? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [Java] EPL Dependencies [arrow]

2024-04-17 Thread via GitHub
lidavidm commented on issue #40896: URL: https://github.com/apache/arrow/issues/40896#issuecomment-2060570597 sure, there's no rush -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] [Java] EPL Dependencies [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on issue #40896: URL: https://github.com/apache/arrow/issues/40896#issuecomment-2060572959 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] [minor] make parquet prune tests more readable [arrow-datafusion]

2024-04-17 Thread via GitHub
Ted-Jiang opened a new issue, #10111: URL: https://github.com/apache/arrow-datafusion/issues/10111 It took me a while to convince myself that this was actually setting up the scenario as described. I eventually found it here:

[PR] [minor] make parquet prune tests more readable [arrow-datafusion]

2024-04-17 Thread via GitHub
Ted-Jiang opened a new pull request, #10112: URL: https://github.com/apache/arrow-datafusion/pull/10112 ## Which issue does this PR close? Closes #10111. ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568395445 ## rust2/core/src/options.rs: ## @@ -0,0 +1,492 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-17 Thread via GitHub
milenkovicm commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2060661362 First of all I'd agree with @tustvold and @alamb, arrow maintainers should not take this responsibility, HDFS store is a bit more complicated than object stores. IMHO,

Re: [PR] Update dependabot to consider datafusion-cli [arrow-datafusion]

2024-04-17 Thread via GitHub
Jefffrey commented on code in PR #10108: URL: https://github.com/apache/arrow-datafusion/pull/10108#discussion_r1568432570 ## .github/dependabot.yml: ## @@ -28,6 +28,20 @@ updates: # arrow is bumped manually - dependency-name: "arrow*" update-types:

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2060690995 > Having had some discussion with my colleagues I don't think this approach would actually be more performant. But thanks for the heads up TBC I was referring to webhdfs

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
mbrobbel commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568400988 ## rust2/core/src/options.rs: ## @@ -0,0 +1,492 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] GH-40547: [R][Docs] Add a non-technical introductory R vignette to the functioning of arrow [arrow]

2024-04-17 Thread via GitHub
oliviermeslin commented on code in PR #40982: URL: https://github.com/apache/arrow/pull/40982#discussion_r1568429659 ## r/vignettes/informal_introduction.Rmd: ## @@ -0,0 +1,297 @@ +--- +title: Getting started with Apache Arrow and R +description: > + An informal introduction

Re: [I] [Java] Remove deprecated code from Arrow Java [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on issue #15167: URL: https://github.com/apache/arrow/issues/15167#issuecomment-2060675804 @kou https://github.com/apache/arrow/issues/41250 Need to list this one as well. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568451821 ## rust2/core/src/options.rs: ## @@ -0,0 +1,492 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat(python): Implement extension type support [arrow-nanoarrow]

2024-04-17 Thread via GitHub
jorisvandenbossche commented on PR #431: URL: https://github.com/apache/arrow-nanoarrow/pull/431#issuecomment-2060743694 Do we want to go the route of a registry and having users define their own? For nanoarrow, I would personally stick to what it in essence is: metadata (and we have

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568477669 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
mbrobbel commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568529728 ## rust2/core/Cargo.toml: ## @@ -0,0 +1,29 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] GH-39482: [JS] Refactor imports [arrow]

2024-04-17 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39483: URL: https://github.com/apache/arrow/pull/39483#issuecomment-2060512581 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5abd9338589f5211ce833a73c2200690b20d37c1. There were

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
mbrobbel commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568362620 ## rust2/core/src/error.rs: ## @@ -0,0 +1,166 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [I] [C++][FS][Azure] CopyFile doesn't work with Azure hierachical namespace support [arrow]

2024-04-17 Thread via GitHub
kou commented on issue #41095: URL: https://github.com/apache/arrow/issues/41095#issuecomment-2060480796 I couldn't find whether should we report this... So I've opened a question issue in Azure/azure-sdk-for-cpp: https://github.com/Azure/azure-sdk-for-cpp/issues/5542 -- This is an

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-04-17 Thread via GitHub
zanmato1984 commented on PR #40998: URL: https://github.com/apache/arrow/pull/40998#issuecomment-2060589435 Maybe a dumb question. Seems the column sorting will happen anyway in encoding [1], how can `Grouper` disrespect this fact and do the comparison with

Re: [PR] feat(python): Implement extension type support [arrow-nanoarrow]

2024-04-17 Thread via GitHub
jorisvandenbossche commented on PR #431: URL: https://github.com/apache/arrow-nanoarrow/pull/431#issuecomment-2060747347 (we could of course still provide some more ergonomic access to the extension name/metadata, e.g. by detecting if those keys are present, and in that case showing them

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
mbrobbel commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568491088 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the

Re: [PR] fix: duplicate output for HashJoinExec in CollectLeft mode [arrow-datafusion]

2024-04-17 Thread via GitHub
Ted-Jiang commented on code in PR #9757: URL: https://github.com/apache/arrow-datafusion/pull/9757#discussion_r1568312944 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -726,6 +747,8 @@ impl ExecutionPlan for HashJoinExec { context.clone(),

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568501581 ## rust2/core/Cargo.toml: ## @@ -0,0 +1,29 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568355944 ## rust2/core/src/error.rs: ## @@ -0,0 +1,166 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] [minor] make parquet prune tests more readable [arrow-datafusion]

2024-04-17 Thread via GitHub
Ted-Jiang closed pull request #10112: [minor] make parquet prune tests more readable URL: https://github.com/apache/arrow-datafusion/pull/10112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-04-17 Thread via GitHub
ZhangHuiGui commented on PR #40998: URL: https://github.com/apache/arrow/pull/40998#issuecomment-2060699644 > are_cols_in_encoding_order=true It's nice question! Use `are_cols_in_encoding_order=true` in Grouper to perform correct comparison. This process can be broken down

Re: [I] [Java] Remove deprecated code from Arrow Java [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on issue #15167: URL: https://github.com/apache/arrow/issues/15167#issuecomment-2060717399 One more to the list: https://github.com/apache/arrow/issues/41252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] ARROW-17255: [C++][Parquet] Add JSON canonical extension type [arrow]

2024-04-17 Thread via GitHub
rok commented on PR #13901: URL: https://github.com/apache/arrow/pull/13901#issuecomment-2060736393 I would now perhaps split out the format definition part (docs) and call a vote on the ML. @pitrou any thoughs at this point? -- This is an automated message from the Apache Git

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568322398 ## rust2/core/src/error.rs: ## @@ -0,0 +1,166 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-17 Thread via GitHub
Silemo commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2060572274 > There has also been interest from delta-rs users, perhaps you can find some companions there to help :) There is where I am coming from. Seeing the small interest there is

Re: [PR] GH-41173: [Java] Add spotless configuration for Maven pom.xml files [arrow]

2024-04-17 Thread via GitHub
lidavidm merged PR #41174: URL: https://github.com/apache/arrow/pull/41174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [Java] Format pom.xml files using Apache Maven pom.xml 2008 convention [arrow]

2024-04-17 Thread via GitHub
lidavidm commented on issue #41173: URL: https://github.com/apache/arrow/issues/41173#issuecomment-2060716557 Issue resolved by pull request 41174 https://github.com/apache/arrow/pull/41174 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
lidavidm commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568455458 ## rust2/core/src/options.rs: ## @@ -0,0 +1,492 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] WIP: [Release] Verify release-16.0.0-rc0 [arrow]

2024-04-17 Thread via GitHub
raulcd commented on PR #41235: URL: https://github.com/apache/arrow/pull/41235#issuecomment-2060758071 > I think that this is not a blocker. I agree. I will send the vote. Thanks @kou -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] GH-41231: [C#] Slice values array when writing a sliced list view array to IPC format [arrow]

2024-04-17 Thread via GitHub
github-actions[bot] commented on PR #41255: URL: https://github.com/apache/arrow/pull/41255#issuecomment-2060931495 :warning: GitHub issue #41231 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568525770 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] GH-41253: [C++] Feature: support filter before agg for acero. [arrow]

2024-04-17 Thread via GitHub
github-actions[bot] commented on PR #41254: URL: https://github.com/apache/arrow/pull/41254#issuecomment-2060831412 :warning: GitHub issue #41253 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-17 Thread via GitHub
Kimahriman commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2061002935 > 2. Second approach is to write native rust hdfs library and I believe @Kimahriman https://github.com/Kimahriman/hdfs-native is on the right track. I haven't use the library

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-04-17 Thread via GitHub
zanmato1984 commented on PR #40998: URL: https://github.com/apache/arrow/pull/40998#issuecomment-2061021793 > > are_cols_in_encoding_order=true > > It's nice question! > > Use `are_cols_in_encoding_order=true` in Grouper to perform correct comparison. This process can be

Re: [I] Add ObjectStore::put_multipart_opts [arrow-rs]

2024-04-17 Thread via GitHub
tustvold closed issue #5435: Add ObjectStore::put_multipart_opts URL: https://github.com/apache/arrow-rs/issues/5435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [CI][Archery] Archery linking should also check for undefined symbols Linux [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on issue #40964: URL: https://github.com/apache/arrow/issues/40964#issuecomment-2061146195 Agreed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] MINOR: [Go] Bump modernc.org/sqlite from 1.29.5 to 1.29.6 in /go [arrow]

2024-04-17 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41208: URL: https://github.com/apache/arrow/pull/41208#issuecomment-2061160036 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 61dde7133381897b5f779827d9c14f684814cb87. There were

Re: [I] Coerce parquet int96 timestamps to microsecond precision [arrow-rs]

2024-04-17 Thread via GitHub
liukun4515 commented on issue #5655: URL: https://github.com/apache/arrow-rs/issues/5655#issuecomment-2060992303 > @tustvold to my understanding it's actually microsecond precision but it's saved as a logical int96 I think in the parquet, the physical type of `int96` represent the

Re: [PR] Account for Timezone when Casting Timestamp to Date32 [arrow-rs]

2024-04-17 Thread via GitHub
liukun4515 commented on PR #5605: URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060927598 > Maybe we can add some way to the parquet arrow reader to override its choice of data type for certain columns to allow users to specify types for cases where it is not clear from

Re: [PR] ci(python): upload nightly python packages [arrow-nanoarrow]

2024-04-17 Thread via GitHub
jorisvandenbossche commented on PR #429: URL: https://github.com/apache/arrow-nanoarrow/pull/429#issuecomment-2061053175 The logs say that the upload was successful, but I am not sure _where_ they exactly got uploaded to .. In any case https://gemfury.com/arrow-nightlies/ is not showing

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
aljazerzen commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568737932 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] Flip interval field ordering (#5654) [arrow-rs]

2024-04-17 Thread via GitHub
pitrou commented on PR #5656: URL: https://github.com/apache/arrow-rs/pull/5656#issuecomment-2060854275 You can probably add some PyArrow integration test(s) in https://github.com/apache/arrow-rs/blob/master/arrow-pyarrow-integration-testing/tests/test_sql.py ? Both to

Re: [PR] Remove deprecated JSON writer [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on PR #5651: URL: https://github.com/apache/arrow-rs/pull/5651#issuecomment-2060980432 I intend to hold this until the next major release of arrow, which depending on the outcome of #5654 may be sooner than anticipated -- This is an automated message from the Apache

Re: [I] Any plan to support JSON or JSONB? [arrow-datafusion]

2024-04-17 Thread via GitHub
samuelcolvin commented on issue #7845: URL: https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2060995995 @alamb if you're interested in JSON parsing support I might be interested in contributing. We (Pydantic) maintain a very fast Rust JSON parser (generally

[I] Provide Arrow Schema Hint to Parquet Reader [arrow-rs]

2024-04-17 Thread via GitHub
tustvold opened a new issue, #5657: URL: https://github.com/apache/arrow-rs/issues/5657 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The parquet reader automatically uses an embedded arrow schema to hint type inference

Re: [I] Coerce parquet int96 timestamps to microsecond precision [arrow-rs]

2024-04-17 Thread via GitHub
liukun4515 commented on issue #5655: URL: https://github.com/apache/arrow-rs/issues/5655#issuecomment-2061005273 I find the definition of the int96 in the deprecated doc

Re: [PR] Account for Timezone when Casting Timestamp to Date32 [arrow-rs]

2024-04-17 Thread via GitHub
liukun4515 commented on PR #5605: URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060960080 > > Do we need to add or wrap the cast expr explicitly to the target timestamp column? > > Yes arrow does not have a notion of "local" timezone, users need to be explicit

Re: [PR] Fix large futures causing stack overflows [arrow-datafusion]

2024-04-17 Thread via GitHub
devinjdangelo commented on PR #10033: URL: https://github.com/apache/arrow-datafusion/pull/10033#issuecomment-2060982582 Thank you for working on this @sergiimk. I looked through the changes and they all look good to me. If this issue is blocking your ability to use the 37.0.0

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
aljazerzen commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568706748 ## rust2/core/src/options.rs: ## @@ -0,0 +1,500 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
aljazerzen commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568700678 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568791830 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] ARROW-17255: [C++][Parquet] Add JSON canonical extension type [arrow]

2024-04-17 Thread via GitHub
pitrou commented on PR #13901: URL: https://github.com/apache/arrow/pull/13901#issuecomment-2060830634 Yes, please make a separate PR for the format proposal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] MINOR: [Go] Bump github.com/klauspost/compress from 1.17.7 to 1.17.8 in /go [arrow]

2024-04-17 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41207: URL: https://github.com/apache/arrow/pull/41207#issuecomment-2061055231 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7d19b832b3a1b369b07195c8d9254aaa48784a7a. There was 1

Re: [I] Any plan to support JSON or JSONB? [arrow-datafusion]

2024-04-17 Thread via GitHub
adriangb commented on issue #7845: URL: https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2061047207 For what it’s worth I think having the ability to performantly parse JSON stored as a String or Binary is valuable in and of itself. You don’t always control how the data

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
mbrobbel commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568783103 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the

Re: [PR] GH-41229: [C++] FS: Support naive GCS Async Close [arrow]

2024-04-17 Thread via GitHub
mapleFU commented on PR #41232: URL: https://github.com/apache/arrow/pull/41232#issuecomment-2061176540 ``` /// \brief Close the stream asynchronously /// /// By default, this will just submit the synchronous Close() to the /// default I/O thread pool. Subclasses may

[PR] Fix AVG groups accummulator ignoring return type [arrow-datafusion]

2024-04-17 Thread via GitHub
gruuya opened a new pull request, #10114: URL: https://github.com/apache/arrow-datafusion/pull/10114 ## Which issue does this PR close? Closes #10113 . ## Rationale for this change ## What changes are included in this PR? Coerce AVG accumulator to designated

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568791830 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] MINOR: [Java] Bump org.apache.calcite.avatica:avatica from 1.24.0 to 1.25.0 in /java [arrow]

2024-04-17 Thread via GitHub
github-actions[bot] commented on PR #41212: URL: https://github.com/apache/arrow/pull/41212#issuecomment-2060897751 Revision: bed59960750b32f119d3627839473f6ff5be6bda Submitted crossbow builds: [ursacomputing/crossbow @

[PR] GH-41253: [C++] Feature: support filter before agg for acero. [arrow]

2024-04-17 Thread via GitHub
Light-City opened a new pull request, #41254: URL: https://github.com/apache/arrow/pull/41254 ### Rationale for this change In order to support more grammatical features, improve the filter before agg. https://www.postgresql.org/docs/current/tutorial-agg.html ###

Re: [PR] Account for Timezone when Casting Timestamp to Date32 [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on PR #5605: URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060977385 Filed #5657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add put_multipart_opts (#5435) [arrow-rs]

2024-04-17 Thread via GitHub
tustvold merged PR #5652: URL: https://github.com/apache/arrow-rs/pull/5652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-39131: [JS] Add at() for array like types [arrow]

2024-04-17 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #40730: URL: https://github.com/apache/arrow/pull/40730#issuecomment-2060853553 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 7d184485aef00659f3616ab993e246eeb0c91e23. There were 6

Re: [I] Reuse `create_physical_sort_exprs` when creating `accumulator` for first/last [arrow-datafusion]

2024-04-17 Thread via GitHub
jayzhan211 commented on issue #10074: URL: https://github.com/apache/arrow-datafusion/issues/10074#issuecomment-2061149382 I would like to continue the discussion in #9972 since I don't have a good reason not to **support directly mutating physical-expr**. And the solution (moving

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568799063 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [I] Rust Interval definition incorrect [arrow-rs]

2024-04-17 Thread via GitHub
pitrou commented on issue #5654: URL: https://github.com/apache/arrow-rs/issues/5654#issuecomment-2060833519 > @pitrou do you know if the integration tests cover the interval types? Yes, they do.

Re: [PR] MINOR: [Go] Bump google.golang.org/grpc from 1.62.1 to 1.63.2 in /go [arrow]

2024-04-17 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41103: URL: https://github.com/apache/arrow/pull/41103#issuecomment-2060859057 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 12eb5a7d2ca6edae4a109e1c4ed30fadec9dfd0d. There were

Re: [PR] MINOR: [Java] Bump org.apache.calcite.avatica:avatica from 1.24.0 to 1.25.0 in /java [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on PR #41212: URL: https://github.com/apache/arrow/pull/41212#issuecomment-2060890408 @github-actions crossbow submit -g java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Any plan to support JSON or JSONB? [arrow-datafusion]

2024-04-17 Thread via GitHub
samuelcolvin commented on issue #7845: URL: https://github.com/apache/arrow-datafusion/issues/7845#issuecomment-2061121332 tiny update to my example above, I realised there’s a much better comparison query: ```sql -- datafusion SELECT count(*) FROM records where

Re: [PR] feat(python): Implement extension type support [arrow-nanoarrow]

2024-04-17 Thread via GitHub
paleolimbot commented on PR #431: URL: https://github.com/apache/arrow-nanoarrow/pull/431#issuecomment-2061169372 > Do we want to go the route of a registry and having users define their own? Great point! I'm still feeling my way through how users should interact with this.

Re: [PR] GH-41229: [C++] FS: Support naive GCS Async Close [arrow]

2024-04-17 Thread via GitHub
mapleFU commented on code in PR #41232: URL: https://github.com/apache/arrow/pull/41232#discussion_r1568780623 ## cpp/src/arrow/filesystem/gcsfs.cc: ## @@ -200,6 +201,16 @@ class GcsOutputStream : public arrow::io::OutputStream { return

Re: [PR] Account for Timezone when Casting Timestamp to Date32 [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on PR #5605: URL: https://github.com/apache/arrow-rs/pull/5605#issuecomment-2060932866 > Do we need to add or wrap the cast expr explicitly to the target timestamp column? Yes arrow does not have a notion of "local" timezone. Theoretically a DF frontend could add

[PR] GH-41231: [C#] Slice values array when writing a sliced list view array to IPC format [arrow]

2024-04-17 Thread via GitHub
adamreeve opened a new pull request, #41255: URL: https://github.com/apache/arrow/pull/41255 ### Rationale for this change Reduces IPC file sizes when writing sliced list view arrays. ### What changes are included in this PR? Updates `ArrowSreamWriter` so it only writes

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568710105 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568810336 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-17 Thread via GitHub
alexandreyc commented on code in PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#discussion_r1568810336 ## rust2/core/src/lib.rs: ## @@ -0,0 +1,520 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] GH-41229: [C++] FS: Support naive GCS Async Close [arrow]

2024-04-17 Thread via GitHub
pitrou commented on PR #41232: URL: https://github.com/apache/arrow/pull/41232#issuecomment-2061242246 > Oh, the FileInterface has already implement `CloseAsync` in this manner( but in default io thread pool), maybe I should rethink the syntax here Why are you doing this since it

[PR] GH-41256: [Format][Docs] Add a canonical extension type specification for a generic text data format (e.g. JSON) [arrow]

2024-04-17 Thread via GitHub
rok opened a new pull request, #41257: URL: https://github.com/apache/arrow/pull/41257 ### Rationale for this change As per #41256 this proposes a specification of a canonical extension type for a generic text data format that will enable storing text based formats such as JSON,

Re: [I] Make FixedSizedList Json serializable [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on issue #5568: URL: https://github.com/apache/arrow-rs/issues/5568#issuecomment-2061309527 `label_issue.py` automatically added labels {'arrow'} from #5646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] panic when casting `ListArray` to `FixedSizeList` [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on issue #5642: URL: https://github.com/apache/arrow-rs/issues/5642#issuecomment-2061310249 `label_issue.py` automatically added labels {'arrow'} from #5643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Make `filter` in `filter_leaves` API propagate error [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on issue #5574: URL: https://github.com/apache/arrow-rs/issues/5574#issuecomment-2061309592 `label_issue.py` automatically added labels {'arrow'} from #5575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] parquet: "not yet implemented" error when codec is actually implemented but disabled [arrow-rs]

2024-04-17 Thread via GitHub
tustvold commented on issue #5520: URL: https://github.com/apache/arrow-rs/issues/5520#issuecomment-2061308858 `label_issue.py` automatically added labels {'parquet'} from #5521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] GH-41256: [Format][Docs] Add a canonical extension type specification for a generic text data format (e.g. JSON) [arrow]

2024-04-17 Thread via GitHub
lidavidm commented on PR #41257: URL: https://github.com/apache/arrow/pull/41257#issuecomment-2061348507 I don't see how they're the same. One is about specific structured data and one is a way for a producer to signal unknown types without erroring the whole operation or dropping data.

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2024-04-17 Thread via GitHub
joellubi commented on PR #38385: URL: https://github.com/apache/arrow/pull/38385#issuecomment-2061343107 @zeroshade It looks like you're good with this merging and there have been no objections. I'll merge this shortly. -- This is an automated message from the Apache Git Service. To

Re: [PR] GH-37720: [Go][FlightSQL] Add prepared statement handle to DoPut result [arrow]

2024-04-17 Thread via GitHub
matthewmturner commented on PR #40311: URL: https://github.com/apache/arrow/pull/40311#issuecomment-2061361363 @zeroshade just to confirm, I assume were blocked on merging this until that unrelated change is resolved? -- This is an automated message from the Apache Git Service. To

Re: [PR] GH-40964: [CI][Archery] Archery linking should also check for undefined symbols Linux [arrow]

2024-04-17 Thread via GitHub
vibhatha commented on PR #40520: URL: https://github.com/apache/arrow/pull/40520#issuecomment-2061400394 @pitrou Sure, I will address these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-41256: [Format][Docs] Add a canonical extension type specification for a generic text data format (e.g. JSON) [arrow]

2024-04-17 Thread via GitHub
pitrou commented on PR #41257: URL: https://github.com/apache/arrow/pull/41257#issuecomment-2061402822 Sur, but where did you get that idea of a "generic type"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] GH-41256: [Format][Docs] Add a canonical extension type specification for a generic text data format (e.g. JSON) [arrow]

2024-04-17 Thread via GitHub
rok commented on PR #41257: URL: https://github.com/apache/arrow/pull/41257#issuecomment-2061520224 It was proposed on the [ML](https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j). -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Refactor `UnwrapCastInComparison` to implement `OptimizerRule::rewrite()` [arrow-datafusion]

2024-04-17 Thread via GitHub
peter-toth commented on PR #10087: URL: https://github.com/apache/arrow-datafusion/pull/10087#issuecomment-2061643740 @alamb, @jayzhan211 here is a follow-up PR to remove remaining `Expr` clones: https://github.com/apache/arrow-datafusion/pull/10115 -- This is an automated message from

Re: [PR] GH-37720: [Go][FlightSQL] Add prepared statement handle to DoPut result [arrow]

2024-04-17 Thread via GitHub
zeroshade commented on PR #40311: URL: https://github.com/apache/arrow/pull/40311#issuecomment-2061699159 @erratic-pattern We don't need to wait for that unrelated issue to be addressed. Once you address the merge conflict I'll merge this. -- This is an automated message from the Apache

Re: [PR] feat: Add manual test to calculate spark builtin functions coverage [arrow-datafusion-comet]

2024-04-17 Thread via GitHub
comphead commented on code in PR #263: URL: https://github.com/apache/arrow-datafusion-comet/pull/263#discussion_r1569118171 ## doc/spark_coverage.txt: ## @@ -0,0 +1,421 @@

Re: [PR] Fix AVG groups accummulator ignoring return type [arrow-datafusion]

2024-04-17 Thread via GitHub
gruuya commented on code in PR #10114: URL: https://github.com/apache/arrow-datafusion/pull/10114#discussion_r1568832792 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -3673,7 +3673,7 @@ physical_plan 11)ProjectionExec: expr=[1 as c, 3 as d]

  1   2   3   4   >