Re: [PR] GH-43693: [C++][Acero] Support AVX2 swiss join decoding [arrow]

2024-09-05 Thread via GitHub
zanmato1984 commented on PR #43832: URL: https://github.com/apache/arrow/pull/43832#issuecomment-272050 Apologies to whom are mis-involved by my careless merge/rebase. I'm cleaning my branch now. Sorry :( -- This is an automated message from the Apache Git Service. To respond to the m

[PR] GH-43983: [C++][Parquet] Add support for arrow::ArrayStatistics: zero-copy types [arrow]

2024-09-05 Thread via GitHub
kou opened a new pull request, #43984: URL: https://github.com/apache/arrow/pull/43984 ### Rationale for this change Statistics is useful for fast processing. Target types: * `Int32` * `Int64` * `Float` * `Double` * `Timestamp[milli]` * `Timestamp[micro]`

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
hellishfire commented on PR #43968: URL: https://github.com/apache/arrow/pull/43968#issuecomment-244082 @vibhatha @lidavidm I rewrote the compare logic and related tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] GH-43693: [C++][Acero] Support AVX2 swiss join decoding [arrow]

2024-09-05 Thread via GitHub
lidavidm commented on PR #43832: URL: https://github.com/apache/arrow/pull/43832#issuecomment-226527 @zanmato1984 might wanna take a second look at that merge/rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] GH-43979: [CI][C++][Dev] Add cpplint to pre-commit [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43982: URL: https://github.com/apache/arrow/pull/43982#issuecomment-2333267494 :warning: GitHub issue #43979 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-43979: [CI][C++][Dev] Add cpplint to pre-commit [arrow]

2024-09-05 Thread via GitHub
kou opened a new pull request, #43982: URL: https://github.com/apache/arrow/pull/43982 ### Rationale for this change cpplint isn't integrated with pre-commit yet. ### What changes are included in this PR? * Add cpplint configuration * Share configuration with pre-commi

Re: [PR] Add breaking change from #6043 to `CHANGELOG` [arrow-rs]

2024-09-05 Thread via GitHub
tustvold merged PR #6354: URL: https://github.com/apache/arrow-rs/pull/6354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] object_store/delimited: Fix `TrailingEscape` condition [arrow-rs]

2024-09-05 Thread via GitHub
tustvold merged PR #6265: URL: https://github.com/apache/arrow-rs/pull/6265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] `object_store::GetOptions` derive `Clone` [arrow-rs]

2024-09-05 Thread via GitHub
tustvold merged PR #6361: URL: https://github.com/apache/arrow-rs/pull/6361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] impl `From>` for `Buffer` [arrow-rs]

2024-09-05 Thread via GitHub
tustvold merged PR #6355: URL: https://github.com/apache/arrow-rs/pull/6355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] GH-43796: [C++] Indent preprocessor directives [arrow]

2024-09-05 Thread via GitHub
kou merged PR #43798: URL: https://github.com/apache/arrow/pull/43798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [I] [Python] Add `__getattr__` to `pyarrow.compute` for typing [arrow]

2024-09-05 Thread via GitHub
zen-xu commented on issue #43285: URL: https://github.com/apache/arrow/issues/43285#issuecomment-2333217961 please install pyarrow stubs ```bash pip install pyarrow-stubs>=17.2 ``` https://github.com/user-attachments/assets/ed6ad087-4ec7-4a36-b2c0-08228a64a87f";> --

Re: [PR] GH-43944: [C++][Parquet] Add support for arrow::ArrayStatistics: non zero-copy int based types [arrow]

2024-09-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43945: URL: https://github.com/apache/arrow/pull/43945#issuecomment-2333196175 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 262d6f6f68814b6495b87a13cfba5fd9bf6c7d67. There were no

Re: [I] python: add a package for bigquery plugin [arrow-adbc]

2024-09-05 Thread via GitHub
lidavidm commented on issue #2110: URL: https://github.com/apache/arrow-adbc/issues/2110#issuecomment-2333194052 The Python package will also be available from conda-forge soon. https://github.com/conda-forge/arrow-adbc-split-feedstock/pull/28 -- This is an automated message from the Apac

Re: [PR] MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java Manual [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43980: URL: https://github.com/apache/arrow/pull/43980#issuecomment-2333190167 Revision: e3fc2de781e3d7aad403e392e702ffd7dfc9885f Submitted crossbow builds: [ursacomputing/crossbow @ actions-5fd8ce5a6e](https://github.com/ursacomputing/crossbow/bra

Re: [PR] MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java Manual [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on PR #43980: URL: https://github.com/apache/arrow/pull/43980#issuecomment-2333188320 @github-actions crossbow submit -g java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java [arrow]

2024-09-05 Thread via GitHub
vibhatha closed pull request #43922: MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java URL: https://github.com/apache/arrow/pull/43922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on PR #43922: URL: https://github.com/apache/arrow/pull/43922#issuecomment-2333188058 Closing in favor of https://github.com/apache/arrow/pull/43980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] [Java] Gandiva Tests are failing due to linking issues [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on issue #43576: URL: https://github.com/apache/arrow/issues/43576#issuecomment-2333172696 Most of the failing tests are due to the `Could not create LLJIT instance` issue [1]. Also, the ProjectTest is mainly failing once. So it would be better to disable the overall test

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on PR #43968: URL: https://github.com/apache/arrow/pull/43968#issuecomment-2333163406 Probably we should address that as well. I think it is in scope. Sorry I missed that. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] MINOR: [CI][C++] Add C++ example builds to "cpp" Crossbow task group [arrow]

2024-09-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43975: URL: https://github.com/apache/arrow/pull/43975#issuecomment-2333162246 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9761241bc831b7f421558622410a39fe4f9aa563. There were no

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
hellishfire commented on PR #43968: URL: https://github.com/apache/arrow/pull/43968#issuecomment-2333162849 > Do null slots still compare equal when the child values are different? I'd like to see a unit test for that. (They should, but it looks like we just directly dispatch to RangeEquals

[PR] MINOR: [Java] Bump com.puppycrawl.tools:checkstyle from 10.17.0 to 10.18.1 in /java Manual [arrow]

2024-09-05 Thread via GitHub
vibhatha opened a new pull request, #43980: URL: https://github.com/apache/arrow/pull/43980 ### Rationale for this change The dependabot PR https://github.com/apache/arrow/pull/43922 automated change doesn't fix some code level changes required. This PR fixes that. ### What ch

[PR] GH-43576: [Java] Gandiva Tests are failing due to linking issues [arrow]

2024-09-05 Thread via GitHub
vibhatha opened a new pull request, #43978: URL: https://github.com/apache/arrow/pull/43978 ### Rationale for this change Gandiva tests are failing due to a linking issue and it is failing the Java CIs. But for most of the made PRs, we cannot verify the overall workflow given that th

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on PR #43968: URL: https://github.com/apache/arrow/pull/43968#issuecomment-2333069427 @lidavidm appreciate your review and also please approve the workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] WIP: Proof-of-concept Parquet GEOMETRY logical type implementation [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43977: URL: https://github.com/apache/arrow/pull/43977#issuecomment-2333063054 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] chore(rust): update package metadata [arrow-adbc]

2024-09-05 Thread via GitHub
lidavidm opened a new pull request, #2132: URL: https://github.com/apache/arrow-adbc/pull/2132 Fixes #2124. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] chore(ci): stop building for Python 3.8 [arrow-adbc]

2024-09-05 Thread via GitHub
lidavidm opened a new pull request, #2131: URL: https://github.com/apache/arrow-adbc/pull/2131 Fixes #2129. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Add Flight JDBC Connection String example [arrow-flight-sql-postgresql]

2024-09-05 Thread via GitHub
kou commented on issue #190: URL: https://github.com/apache/arrow-flight-sql-postgresql/issues/190#issuecomment-2332984323 Could you share PostgreSQL log on the error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] GH-43684: [Python][Dataset] Python / Cython interface to C++ arrow::dataset::Partitioning::Format [arrow]

2024-09-05 Thread via GitHub
amoeba commented on code in PR #43740: URL: https://github.com/apache/arrow/pull/43740#discussion_r1746347002 ## python/pyarrow/_dataset.pyx: ## @@ -2505,6 +2505,43 @@ cdef class Partitioning(_Weakrefable): result = self.partitioning.Parse(tobytes(path)) return

Re: [PR] GH-43684: [Python][Dataset] Python / Cython interface to C++ arrow::dataset::Partitioning::Format [arrow]

2024-09-05 Thread via GitHub
amoeba commented on PR #43740: URL: https://github.com/apache/arrow/pull/43740#issuecomment-2332965777 @jorisvandenbossche and/or @pitrou: The changes here look good to me, would either of you like to review? -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] [Python][FlightRPC] IPC error using Python GeneratorStream for tables containing Categorical / DictionaryArray [arrow]

2024-09-05 Thread via GitHub
lidavidm commented on issue #38480: URL: https://github.com/apache/arrow/issues/38480#issuecomment-2332957148 https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.scanner -- This is an automated message from the Apache Git Service. To respond

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43972: URL: https://github.com/apache/arrow/pull/43972#issuecomment-2332915508 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c2123b8b90ab952f854912459bb33ebaf0d99611. There were no

Re: [I] adbc_ingest is working better, but still has some COPY INTO issues after adbc 1.2 release [arrow-adbc]

2024-09-05 Thread via GitHub
joellubi commented on issue #2128: URL: https://github.com/apache/arrow-adbc/issues/2128#issuecomment-2332794697 I know the latest gosnowflake release included some improvements to context propagation/cancelation. I wonder if one of those changes could be causing this error. I'll try to rep

Re: [I] adbc_ingest is working better, but still has some COPY INTO issues after adbc 1.2 release [arrow-adbc]

2024-09-05 Thread via GitHub
joellubi commented on issue #2128: URL: https://github.com/apache/arrow-adbc/issues/2128#issuecomment-2332696973 Thanks for the quick feedback @davlee1972. Can you check whether the copy history view is consistent with 901 files missing from the stage? The error is reporting that afte

Re: [I] Add Flight JDBC Connection String example [arrow-flight-sql-postgresql]

2024-09-05 Thread via GitHub
edmondop commented on issue #190: URL: https://github.com/apache/arrow-flight-sql-postgresql/issues/190#issuecomment-2332693689 I figured it out reading carefully the logs that there was a previous problem that was hiding the real problem. ``` postgres-1 | postgres-1 | 2024-0

Re: [PR] Improve performance of set_bits by avoiding to set individual bits [arrow-rs]

2024-09-05 Thread via GitHub
kazuyukitanimura commented on code in PR #6288: URL: https://github.com/apache/arrow-rs/pull/6288#discussion_r1746191197 ## arrow-buffer/src/util/bit_mask.rs: ## @@ -32,33 +31,126 @@ pub fn set_bits( ) -> usize { let mut null_count = 0; -let mut bits_to_align = offse

Re: [I] [C++][Parquet] Add support for arrow::ArrayStatistics: non zero-copy int based types [arrow]

2024-09-05 Thread via GitHub
kou commented on issue #43944: URL: https://github.com/apache/arrow/issues/43944#issuecomment-2332595060 Issue resolved by pull request 43945 https://github.com/apache/arrow/pull/43945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] MINOR: [CI][C++] Add C++ example builds to "cpp" Crossbow task group [arrow]

2024-09-05 Thread via GitHub
kou merged PR #43975: URL: https://github.com/apache/arrow/pull/43975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-4: Update package name [arrow-go]

2024-09-05 Thread via GitHub
kou commented on PR #88: URL: https://github.com/apache/arrow-go/pull/88#issuecomment-2332574020 I want to prevent adding a new Go commit to apache/arrow after this. Because cherry-picking will be difficult after this. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] GH-11: Add test CI: macOS [arrow-go]

2024-09-05 Thread via GitHub
kou commented on PR #86: URL: https://github.com/apache/arrow-go/pull/86#issuecomment-2332572063 OK. How about #88? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[PR] GH-4: Update package name [arrow-go]

2024-09-05 Thread via GitHub
kou opened a new pull request, #88: URL: https://github.com/apache/arrow-go/pull/88 Fix GH-4 This is done by the following command line: find . -type f -exec sed -i'' -e 's,apache/arrow/go,apache/arrow-go,g' '{}' ';' -- This is an automated message from the Apache Git Se

Re: [I] Push v17.0.0 tag to support migrating to apache/arrow-go without changing version [arrow-go]

2024-09-05 Thread via GitHub
kou commented on issue #87: URL: https://github.com/apache/arrow-go/issues/87#issuecomment-2332548094 OK. I withdraw this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations [arrow]

2024-09-05 Thread via GitHub
kou commented on code in PR #43957: URL: https://github.com/apache/arrow/pull/43957#discussion_r1746083477 ## cpp/src/arrow/integration/json_internal.cc: ## @@ -969,11 +1009,17 @@ Result> GetDecimal(const RjObject& json_type) { bit_width = maybe_bit_width.ValueOrDie();

[PR] chore(ci): Fix docker images that are failing to build [arrow-nanoarrow]

2024-09-05 Thread via GitHub
paleolimbot opened a new pull request, #603: URL: https://github.com/apache/arrow-nanoarrow/pull/603 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: Implement read support for String/Binary View types [arrow-nanoarrow]

2024-09-05 Thread via GitHub
WillAyd commented on PR #596: URL: https://github.com/apache/arrow-nanoarrow/pull/596#issuecomment-2332375896 Bah ignore what I just said - bad debugging. Digging deeper! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [Python] Accessing parquet files with parquet.read_table in google cloud storage fails, but works with dataset, works in 16.1.0 fails in 17.0.0 [arrow]

2024-09-05 Thread via GitHub
amoeba commented on issue #43574: URL: https://github.com/apache/arrow/issues/43574#issuecomment-2332362113 Thanks. Some thoughts: - `read_table` errors in your original code where `ds.dataset` does not because (1) `read_table` defaults to Hive partitioning and `ds.dataset` doesn't (

Re: [PR] feat: Implement read support for String/Binary View types [arrow-nanoarrow]

2024-09-05 Thread via GitHub
WillAyd commented on PR #596: URL: https://github.com/apache/arrow-nanoarrow/pull/596#issuecomment-2332359507 Ah OK interesting. Do you know if this is confirmed to be working upstream in Arrow? The reason I ask is that when running the test suite for what's already in this PR, when I hit t

Re: [PR] GH-43946: [C++][Parquet] Guard against use of cleared decryptor/encryptor [arrow]

2024-09-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #43947: URL: https://github.com/apache/arrow/pull/43947#issuecomment-2332286520 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 032e6a42bfa359b62d0ed4e5d9b44582a558c87e. There were no

[I] How does `Buffer::from_custom_allocation` work ? [arrow-rs]

2024-09-05 Thread via GitHub
edgarriba opened a new issue, #6362: URL: https://github.com/apache/arrow-rs/issues/6362 hi, I have the [kornia-rs](https://github.com/kornia/kornia-rs) crate where I implemented a custom [`TensorStorage`](https://github.com/kornia/kornia-rs/blob/main/crates/kornia-core/src/storage.rs#L28)

Re: [I] [Python] Accessing parquet files with parquet.read_table in google cloud storage fails, but works with dataset, works in 16.1.0 fails in 17.0.0 [arrow]

2024-09-05 Thread via GitHub
brokenjacobs commented on issue #43574: URL: https://github.com/apache/arrow/issues/43574#issuecomment-2332262415 FWIW we have other files with alphanumerics in that field as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [Python] Accessing parquet files with parquet.read_table in google cloud storage fails, but works with dataset, works in 16.1.0 fails in 17.0.0 [arrow]

2024-09-05 Thread via GitHub
brokenjacobs commented on issue #43574: URL: https://github.com/apache/arrow/issues/43574#issuecomment-2332261083 > Can you share the schema of the file here? `pa.parquet.read_schema('gs:///v1/li191r/ms=2023-01/source_id=9319/li191r_9319_2023-01-02.parquet')` should be enough. ```

Re: [PR] Benchmark for bit_mask (set_bits) [arrow-rs]

2024-09-05 Thread via GitHub
kazuyukitanimura commented on PR #6353: URL: https://github.com/apache/arrow-rs/pull/6353#issuecomment-2332232251 Thank you @andygrove @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] WIP: Proof-of-concept Parquet GEOMETRY logical type implementation [arrow]

2024-09-05 Thread via GitHub
paleolimbot commented on code in PR #43196: URL: https://github.com/apache/arrow/pull/43196#discussion_r1745868930 ## cpp/src/parquet/statistics.h: ## @@ -135,6 +186,7 @@ class PARQUET_EXPORT EncodedStatistics { bool has_max = false; bool has_null_count = false; bool ha

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
pitrou merged PR #43972: URL: https://github.com/apache/arrow/pull/43972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
pitrou commented on issue #40154: URL: https://github.com/apache/arrow/issues/40154#issuecomment-2332134445 Issue resolved by pull request 43972 https://github.com/apache/arrow/pull/43972 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] WIP: Proof-of-concept Parquet GEOMETRY logical type implementation [arrow]

2024-09-05 Thread via GitHub
wgtmac commented on PR #43196: URL: https://github.com/apache/arrow/pull/43196#issuecomment-2332125711 Thanks @paleolimbot for initiating the PoC! I've left some inline comments on this PR. Thanks @jiayuasu and @Kontinuation to continue the effort! Let's work together to make it to t

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
pitrou commented on PR #43726: URL: https://github.com/apache/arrow/pull/43726#issuecomment-2332115594 Thanks for the fix and tests @mapleFU ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Relax `dict_id` equality in field merging [arrow-rs]

2024-09-05 Thread via GitHub
brancz commented on issue #6356: URL: https://github.com/apache/arrow-rs/issues/6356#issuecomment-2332102012 Isn't that just an optimization though? I understand that if two dict IDs are identical one can just append the index arrays to each other safely, and if not then the dicts needs to

Re: [PR] MINOR: [CI][C++] Add C++ example builds to "cpp" Crossbow task group [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43975: URL: https://github.com/apache/arrow/pull/43975#issuecomment-2332086220 Revision: 767de80d3e61e8c899d11cdfec48fff9e0c7d127 Submitted crossbow builds: [ursacomputing/crossbow @ actions-4f5e187eb4](https://github.com/ursacomputing/crossbow/bra

[PR] MINOR: [CI][C++] Add example builds to "cpp" Crossbow task group [arrow]

2024-09-05 Thread via GitHub
pitrou opened a new pull request, #43975: URL: https://github.com/apache/arrow/pull/43975 ### Rationale for this change The `python` task group already includes the Python example builds. This PR does the same for the `cpp` task group. ### Are these changes tested? By CI

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43972: URL: https://github.com/apache/arrow/pull/43972#issuecomment-2332049154 Revision: be1725906eae4f3773b165546cd8bfe5af58fa9f Submitted crossbow builds: [ursacomputing/crossbow @ actions-d1fc910e2b](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
pitrou commented on code in PR #43726: URL: https://github.com/apache/arrow/pull/43726#discussion_r1745776051 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -366,8 +366,14 @@ std::optional ParquetFileFragment::EvaluateStatisticsAsExpr const parquet::Statistics& statistics

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
pitrou commented on code in PR #43726: URL: https://github.com/apache/arrow/pull/43726#discussion_r1745775278 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -366,8 +366,14 @@ std::optional ParquetFileFragment::EvaluateStatisticsAsExpr const parquet::Statistics& statistics

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
mapleFU commented on code in PR #43726: URL: https://github.com/apache/arrow/pull/43726#discussion_r1745771134 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -366,8 +366,13 @@ std::optional ParquetFileFragment::EvaluateStatisticsAsExpr const parquet::Statistics& statistic

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
mapleFU commented on code in PR #43726: URL: https://github.com/apache/arrow/pull/43726#discussion_r1745770468 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -366,8 +366,13 @@ std::optional ParquetFileFragment::EvaluateStatisticsAsExpr const parquet::Statistics& statistic

Re: [I] [Python] Segmentation fault when pyarrow is imported in exit handler [arrow]

2024-09-05 Thread via GitHub
pitrou commented on issue #38626: URL: https://github.com/apache/arrow/issues/38626#issuecomment-2332028208 @asda10 This does not seem related to this issue, can you open a new issue for it? Also please include a reproducer so that we can try it out ourselves. -- This is an automated mess

Re: [PR] GH-32538: [C++][Parquet] Add JSON canonical extension type [arrow]

2024-09-05 Thread via GitHub
rok commented on code in PR #13901: URL: https://github.com/apache/arrow/pull/13901#discussion_r1745764195 ## cpp/src/arrow/extension/json.cc: ## @@ -0,0 +1,62 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the

Re: [PR] GH-43956: [C++][Format] Add initial Decimal32/Decimal64 implementations [arrow]

2024-09-05 Thread via GitHub
wgtmac commented on PR #43957: URL: https://github.com/apache/arrow/pull/43957#issuecomment-2332009931 > > Following the same pattern, this change means that using `decimal(precision, scale)` instead of the specific `decimal32`/`decimal64`/`decimal128`/`decimal256` functions results in the

Re: [PR] GH-32538: [C++][Parquet] Add JSON canonical extension type [arrow]

2024-09-05 Thread via GitHub
pitrou commented on code in PR #13901: URL: https://github.com/apache/arrow/pull/13901#discussion_r1745722573 ## cpp/src/arrow/extension/json_test.cc: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Add Meson build with Werror [arrow-nanoarrow]

2024-09-05 Thread via GitHub
WillAyd commented on code in PR #448: URL: https://github.com/apache/arrow-nanoarrow/pull/448#discussion_r1745724686 ## meson.build: ## @@ -35,6 +35,12 @@ project( # add_project_arguments(['-fvisibility=hidden'], language: 'cpp') # endif +cpp = meson.get_compiler('cpp') +a

Re: [PR] Add Meson build with Werror [arrow-nanoarrow]

2024-09-05 Thread via GitHub
WillAyd commented on code in PR #448: URL: https://github.com/apache/arrow-nanoarrow/pull/448#discussion_r1744621266 ## src/nanoarrow/common/array_test.cc: ## @@ -98,9 +98,11 @@ TEST(ArrayTest, ArrayTestAllocateChildren) { ArrowArrayRelease(&array); ASSERT_EQ(ArrowArrayI

Re: [PR] Benchmark for bit_mask (set_bits) [arrow-rs]

2024-09-05 Thread via GitHub
alamb merged PR #6353: URL: https://github.com/apache/arrow-rs/pull/6353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Benchmark for bit_mask (set_bits) [arrow-rs]

2024-09-05 Thread via GitHub
alamb commented on code in PR #6353: URL: https://github.com/apache/arrow-rs/pull/6353#discussion_r1745714413 ## arrow-buffer/benches/bit_mask.rs: ## @@ -0,0 +1,58 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] GH-11: Add test CI: macOS [arrow-go]

2024-09-05 Thread via GitHub
zeroshade commented on PR #86: URL: https://github.com/apache/arrow-go/pull/86#issuecomment-2331897474 Before we can truly claim that the CI is working, we need to update the package path of the `go.mod` file and all of the imports in the files to point to itself here instead of the old pat

Re: [I] Add Catalog DB Schema subcommands to `flight_sql_client` [arrow-rs]

2024-09-05 Thread via GitHub
crepererum closed issue #6331: Add Catalog DB Schema subcommands to `flight_sql_client` URL: https://github.com/apache/arrow-rs/issues/6331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] feat: add catalog/schema subcommands to flight_sql_client. [arrow-rs]

2024-09-05 Thread via GitHub
crepererum merged PR #6332: URL: https://github.com/apache/arrow-rs/pull/6332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [I] Release arrow-rs / parquet major version `53.0.0` (September 2024) [arrow-rs]

2024-09-05 Thread via GitHub
alamb commented on issue #6016: URL: https://github.com/apache/arrow-rs/issues/6016#issuecomment-2331856481 > Great work! The newly introduced `execute_ingest` is very helpful! I think we can thank @djanderson for that https://github.com/apache/arrow-rs/pull/6201 ❤️ -- This

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
vibhatha commented on PR #43968: URL: https://github.com/apache/arrow/pull/43968#issuecomment-2331839334 Overall LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] GH-37756: [Format][Docs] Document IPC Compression [arrow]

2024-09-05 Thread via GitHub
ianmcook commented on code in PR #43950: URL: https://github.com/apache/arrow/pull/43950#discussion_r1745604553 ## docs/source/format/Columnar.rst: ## @@ -1385,6 +1387,45 @@ have two entries in each RecordBatch. For a RecordBatch of this schema with buffer 13: col2data

Re: [I] Relax `dict_id` equality in field merging [arrow-rs]

2024-09-05 Thread via GitHub
tustvold commented on issue #6356: URL: https://github.com/apache/arrow-rs/issues/6356#issuecomment-2331741770 https://github.com/apache/arrow-rs/issues/5981 is possibly related. As it stands I am not sure we can ignore the dict ID when merging, as unlike with nulls, two arrays with d

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
mapleFU commented on PR #43972: URL: https://github.com/apache/arrow/pull/43972#issuecomment-2331738033 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
wgtmac commented on PR #43972: URL: https://github.com/apache/arrow/pull/43972#issuecomment-2331728370 I just took a glimpse of it. It looks great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-40154: [C++][Parquet] Separate encoders and decoder [arrow]

2024-09-05 Thread via GitHub
pitrou commented on PR #43972: URL: https://github.com/apache/arrow/pull/43972#issuecomment-2331723949 I've checked that `git blame -C` is able to trace back through the history for `encoder.cc` and `decoder.cc` (though `git log -C` isn't, for some reason). -- This is an automated message

Re: [PR] GH-43964: [Python] Build macOS and manylinux wheels for free-threading [arrow]

2024-09-05 Thread via GitHub
lysnikolaou commented on PR #43965: URL: https://github.com/apache/arrow/pull/43965#issuecomment-2331724905 That's one option. To basically add a _new_ service that installs `python3.13-nogil` and then uses that to test the wheels, however that would mean that we have to call different test

Re: [PR] Fix `MutableBuffer::into_buffer` leaking its extra capacity into the final buffer [arrow-rs]

2024-09-05 Thread via GitHub
teh-cmc commented on PR #6300: URL: https://github.com/apache/arrow-rs/pull/6300#issuecomment-2331714770 :+1: Alright, closing this in favor of this issue then: * https://github.com/apache/arrow-rs/issues/6360 I won't be able to attend to it for a couple weeks though. -- This is

[I] Add `shrink_to_fit` to `Array` [arrow-rs]

2024-09-05 Thread via GitHub
teh-cmc opened a new issue, #6360: URL: https://github.com/apache/arrow-rs/issues/6360 We keep a lot of Arrow data in long-lived memory storage, and therefore must be guaranteed that the data has been optimized for space before it gets committed to storage, regardless of how it got built or

Re: [I] [Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path [arrow]

2024-09-05 Thread via GitHub
pitrou commented on issue #41365: URL: https://github.com/apache/arrow/issues/41365#issuecomment-2331654392 > One option could be to use that in `FileSystemFromUri`, for example when the parsing fails check if it actually is a likely URI, and if not raise a different error message? >

Re: [I] Add support for BinaryView in arrow_string::length [arrow-rs]

2024-09-05 Thread via GitHub
Omega359 commented on issue #6358: URL: https://github.com/apache/arrow-rs/issues/6358#issuecomment-2331608046 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] GH-43518: [Python][Packaging][CI] Drop Python 3.8 support [arrow]

2024-09-05 Thread via GitHub
jorisvandenbossche commented on code in PR #43970: URL: https://github.com/apache/arrow/pull/43970#discussion_r1745414082 ## .github/workflows/go.yml: ## @@ -209,7 +209,7 @@ jobs: - name: Setup Python uses: actions/setup-python@f677139bbe7f9c59b41e40162b753c062f5

Re: [PR] GH-43964: [Python] Build macOS and manylinux wheels for free-threading [arrow]

2024-09-05 Thread via GitHub
jorisvandenbossche commented on PR #43965: URL: https://github.com/apache/arrow/pull/43965#issuecomment-2331456946 > Maybe start with an ubuntu images and install a Python from deadsnakes? I assume it could be similar as the one you are adding for https://github.com/apache/arrow/pull/

Re: [PR] GH-43118: [JS] Add interval for unit MONTH_DAY_NANO [arrow]

2024-09-05 Thread via GitHub
domoritz commented on code in PR #43117: URL: https://github.com/apache/arrow/pull/43117#discussion_r1745373233 ## dev/archery/archery/integration/datagen.py: ## @@ -1891,9 +1887,11 @@ def _temp_path(): generate_duration_case(), -generate_interval_case(), +

Re: [PR] GH-43712: [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly when !HasNullCount() [arrow]

2024-09-05 Thread via GitHub
mapleFU commented on PR #43726: URL: https://github.com/apache/arrow/pull/43726#issuecomment-2331378412 @pitrou @bkietz I've change to return `is_null` when value count is 0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-43969: [CI][Dev] Prune .dockerignore [arrow]

2024-09-05 Thread via GitHub
pitrou commented on PR #43971: URL: https://github.com/apache/arrow/pull/43971#issuecomment-2331366497 CI failures look unrelated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] GH-43964: [Python] Build macOS and manylinux wheels for free-threading [arrow]

2024-09-05 Thread via GitHub
lysnikolaou commented on PR #43965: URL: https://github.com/apache/arrow/pull/43965#issuecomment-2331354062 > For linux (where it already succeeds to the testing step), we will also need to get a 3.13t build of the test image (`python-wheel-manylinux-test-unittests`) Yeah, I'm workin

Re: [I] [Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path [arrow]

2024-09-05 Thread via GitHub
pitrou commented on issue #41365: URL: https://github.com/apache/arrow/issues/41365#issuecomment-2331335872 There are all kinds of failure modes: ```python >>> pq.read_table("/local file") Traceback (most recent call last): ... FileNotFoundError: /local file >>> pq.rea

Re: [I] [Python] Pyarrow fs incorrectly resolves S3 URIs with white space as a local path [arrow]

2024-09-05 Thread via GitHub
jorisvandenbossche commented on issue #41365: URL: https://github.com/apache/arrow/issues/41365#issuecomment-2331306145 So going through the logic in the `_resolve_filesystem_and_path` helper function, I think the main reason we catch (and swallow) the error from `FileSystem.from_uri` is be

Re: [PR] GH-43969: [CI][Dev] Prune .dockerignore [arrow]

2024-09-05 Thread via GitHub
pitrou commented on PR #43971: URL: https://github.com/apache/arrow/pull/43971#issuecomment-2331266993 @github-actions crossbow submit -g nightly-release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-43966: [Java] Check for nullabilities when comparing StructVector [arrow]

2024-09-05 Thread via GitHub
hellishfire commented on code in PR #43968: URL: https://github.com/apache/arrow/pull/43968#discussion_r1745226535 ## java/vector/src/main/java/org/apache/arrow/vector/compare/RangeEqualsVisitor.java: ## @@ -354,6 +355,18 @@ protected boolean compareStructVectors(Range range) {

Re: [PR] GH-43969: [CI][Dev] Prune .dockerignore [arrow]

2024-09-05 Thread via GitHub
github-actions[bot] commented on PR #43971: URL: https://github.com/apache/arrow/pull/43971#issuecomment-2331171637 Revision: 1cf73addfcdeb14e7e4d37482966c402773a82ab Submitted crossbow builds: [ursacomputing/crossbow @ actions-e95f93017a](https://github.com/ursacomputing/crossbow/bra

  1   2   >