Re: [I] [Variant] Present Variant at Iceberg Summit NYC July 10, 2025 [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on issue #7858: URL: https://github.com/apache/arrow-rs/issues/7858#issuecomment-3065091187 I think they recorded the talk -- I will post it here when it goes online -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [Variant] Present Variant at Iceberg Summit NYC July 10, 2025 [arrow-rs]

2025-07-12 Thread via GitHub
alamb closed issue #7858: [Variant] Present Variant at Iceberg Summit NYC July 10, 2025 URL: https://github.com/apache/arrow-rs/issues/7858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] [Parquet] [Question / Potential Bug Report] Should `SerializedPageReaderState.offset` & `remaining_bytes` be `u64` instead of `usize`? [arrow-rs]

2025-07-12 Thread via GitHub
JigaoLuo commented on issue #7910: URL: https://github.com/apache/arrow-rs/issues/7910#issuecomment-3065309367 Update: added `remaining_bytes` to the title. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] [Variant] Avoid collecting offset iterator [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on issue #7901: URL: https://github.com/apache/arrow-rs/issues/7901#issuecomment-3065386646 Hi, I've already made some work on this issue in https://github.com/apache/arrow-rs/pull/7906/files I had some ideas on the other areas. Any chance I could push that P

Re: [PR] [Variant] Use simdutf8 for UTF-8 validation [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7908: URL: https://github.com/apache/arrow-rs/pull/7908#discussion_r2202644249 ## parquet-variant/Cargo.toml: ## @@ -37,6 +37,7 @@ arrow-schema = { workspace = true } chrono = { workspace = true } indexmap = "2.10.0" +simdutf8 = { wor

Re: [PR] [Variant] Use simdutf8 for UTF-8 validation [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7908: URL: https://github.com/apache/arrow-rs/pull/7908#discussion_r2202644896 ## parquet-variant/Cargo.toml: ## @@ -37,6 +37,7 @@ arrow-schema = { workspace = true } chrono = { workspace = true } indexmap = "2.10.0" +simdutf8 = { wor

Re: [PR] [Parquet] Use `u64` for `SerializedPageReaderState.offset` & `remaining_bytes`, instead of `usize` [arrow-rs]

2025-07-12 Thread via GitHub
XiangpengHao commented on code in PR #7918: URL: https://github.com/apache/arrow-rs/pull/7918#discussion_r2202783668 ## parquet/src/file/serialized_reader.rs: ## @@ -469,10 +469,10 @@ pub(crate) fn decode_page( enum SerializedPageReaderState { Values { /// The cur

Re: [PR] [Variant] test: add variant object tests with different sizes [arrow-rs]

2025-07-12 Thread via GitHub
odysa commented on code in PR #7896: URL: https://github.com/apache/arrow-rs/pull/7896#discussion_r2202782036 ## parquet-variant/src/builder.rs: ## @@ -2171,4 +2171,116 @@ mod tests { let variant = Variant::try_new_with_metadata(metadata, &value).unwrap(); ass

[PR] adbc_driver_manager.h: fix invalid Doxygen documentation [arrow-adbc]

2025-07-12 Thread via GitHub
rouault opened a new pull request, #3141: URL: https://github.com/apache/arrow-adbc/pull/3141 that raises warnings with clang -Wdocumentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] Improve memory usage for `arrow-row -> String/BinaryView` when utf8 validation disabled [arrow-rs]

2025-07-12 Thread via GitHub
ding-young opened a new pull request, #7917: URL: https://github.com/apache/arrow-rs/pull/7917 # Which issue does this PR close? - Related to #6057 . # Rationale for this change As described in above issue, we currently # What changes are included in this PR? 1. se

Re: [I] [Parquet] [Question / Potential Bug Report] Should `SerializedPageReaderState.offset` be `u64` instead of `usize`? [arrow-rs]

2025-07-12 Thread via GitHub
jhorstmann commented on issue #7910: URL: https://github.com/apache/arrow-rs/issues/7910#issuecomment-3065102631 I think this sounds like a very reasonable change, the offset is used as a parameter to `ChunkReader::get_read`, where it currently has to be cast back from `usize` to `u64`. The

Re: [I] [Parquet] [Question / Potential Bug Report] Should `SerializedPageReaderState.offset` be `u64` instead of `usize`? [arrow-rs]

2025-07-12 Thread via GitHub
JigaoLuo commented on issue #7910: URL: https://github.com/apache/arrow-rs/issues/7910#issuecomment-3065107407 Thanks for the confirmation! I also agree that `remaining_bytes` should be changed to u64, since these fields representing **file** offsets—and shouldn't be affected by differences

[PR] Restructure compare_greater function used in parquet statistics for better performance [arrow-rs]

2025-07-12 Thread via GitHub
jhorstmann opened a new pull request, #7916: URL: https://github.com/apache/arrow-rs/pull/7916 # Which issue does this PR close? Another small optimization to parquet writing, followup to #7822 (I can create a separate issue if needed). # Rationale for this change Improv

[PR] Use `u64` for `SerializedPageReaderState.offset` & `remaining_bytes`, instead of `usize` [arrow-rs]

2025-07-12 Thread via GitHub
JigaoLuo opened a new pull request, #7918: URL: https://github.com/apache/arrow-rs/pull/7918 # Which issue does this PR close? Closes #7910 # Rationale for this change There is a copy from my issue page: https://github.com/apache/arrow-rs/blob/2be261b78b16a4aa7b5b9aec

[PR] GH-40730: [C++][Parquet] Add setting to limit the number of rows written per page [arrow]

2025-07-12 Thread via GitHub
wgtmac opened a new pull request, #47090: URL: https://github.com/apache/arrow/pull/47090 ### Rationale for this change Currently only page size is limited. We need to limit number of rows per page too. ### What changes are included in this PR? Add `parquet::WriterProper

Re: [PR] [Variant] Use simdutf8 for UTF-8 validation [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7908: URL: https://github.com/apache/arrow-rs/pull/7908#discussion_r2202816319 ## parquet-variant/src/utils.rs: ## @@ -74,13 +74,32 @@ pub(crate) fn first_byte_from_slice(slice: &[u8]) -> Result { .ok_or_else(|| ArrowError::InvalidArgum

Re: [PR] GH-40730: [C++][Parquet] Add setting to limit the number of rows written per page [arrow]

2025-07-12 Thread via GitHub
github-actions[bot] commented on PR #47090: URL: https://github.com/apache/arrow/pull/47090#issuecomment-3065863537 :warning: GitHub issue #40730 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [Parquet] Use `u64` for `SerializedPageReaderState.offset` & `remaining_bytes`, instead of `usize` [arrow-rs]

2025-07-12 Thread via GitHub
JigaoLuo commented on PR #7918: URL: https://github.com/apache/arrow-rs/pull/7918#issuecomment-3065865723 Thank you @XiangpengHao for reviewing. I have updated the doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] GH-40730: [C++][Parquet] Add setting to limit the number of rows written per page [arrow]

2025-07-12 Thread via GitHub
wgtmac commented on PR #47090: URL: https://github.com/apache/arrow/pull/47090#issuecomment-3065865847 Please check if this is the right direction. @pitrou @mapleFU @adamreeve BTW, some existing test cases will break if I switch the default value to limit 20,000 rows per page. I'm no

Re: [PR] GH-47030: [C++][Parquet] Add setting to limit the number of rows written per page [arrow]

2025-07-12 Thread via GitHub
github-actions[bot] commented on PR #47090: URL: https://github.com/apache/arrow/pull/47090#issuecomment-3065866565 :warning: GitHub issue #47030 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
Samyak2 opened a new pull request, #7919: URL: https://github.com/apache/arrow-rs/pull/7919 # Which issue does this PR close? - Closes #7893 # What changes are included in this PR? Still very early. Opening this PR to get some early feedback on the approach. The ap

Re: [I] [Variant][Compute] `variant_get` kernel [arrow-rs]

2025-07-12 Thread via GitHub
Samyak2 commented on issue #7893: URL: https://github.com/apache/arrow-rs/issues/7893#issuecomment-3065869618 I have a draft here: https://github.com/apache/arrow-rs/pull/7919 It's still very messy, but I wanted to get some early feedback on my approach -- This is an automated messa

Re: [I] [Variant] Offer `simdutf8` as an optional dependency when validating metadata [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on issue #7902: URL: https://github.com/apache/arrow-rs/issues/7902#issuecomment-3065869395 Makes sense -- thank yu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [Variant] VariantBuilder with VariantMetadata instead of MetadataBuilder [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7915: URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2202825775 ## parquet-variant/src/builder.rs: ## @@ -835,18 +936,23 @@ impl<'a> ObjectBuilder<'a> { /// /// Note: when inserting duplicate keys, the new value overwrite

Re: [PR] [Variant] Use simdutf8 for UTF-8 validation [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7908: URL: https://github.com/apache/arrow-rs/pull/7908#discussion_r2202823545 ## parquet-variant/src/utils.rs: ## @@ -74,13 +74,32 @@ pub(crate) fn first_byte_from_slice(slice: &[u8]) -> Result { .ok_or_else(|| ArrowError::InvalidArgumen

Re: [PR] [Variant] test: add variant object tests with different sizes [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7896: URL: https://github.com/apache/arrow-rs/pull/7896#discussion_r2202826584 ## parquet-variant/src/variant/object.rs: ## @@ -618,4 +620,112 @@ mod tests { ArrowError::InvalidArgumentError(ref msg) if msg.contains("Tried to extract

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202836469 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -168,7 +166,105 @@ impl VariantArrayBuilder { self.value_buffer.extend_from_slice(value); }

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202833244 ## parquet-variant/src/builder.rs: ## @@ -315,29 +343,29 @@ impl MetadataBuilder { let string_start = offset_start + (nkeys + 1) * offset_size as usize;

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202830742 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -168,7 +166,105 @@ impl VariantArrayBuilder { self.value_buffer.extend_from_slice(value); }

Re: [PR] [Variant] test: add variant object tests with different sizes [arrow-rs]

2025-07-12 Thread via GitHub
odysa commented on code in PR #7896: URL: https://github.com/apache/arrow-rs/pull/7896#discussion_r2202838686 ## parquet-variant/src/variant/object.rs: ## @@ -618,4 +620,112 @@ mod tests { ArrowError::InvalidArgumentError(ref msg) if msg.contains("Tried to extract

[I] [Variant] Improve `VariantArray` performance by storing the index of the metadata and value arrays [arrow-rs]

2025-07-12 Thread via GitHub
alamb opened a new issue, #7920: URL: https://github.com/apache/arrow-rs/issues/7920 Would it be worth it to store the index of the metadata and value arrays? This would speed up `metadata_field` and `value_field` functions _Originally posted by @Samyak2 in https://github.com/apache/

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905#discussion_r2202839050 ## parquet-variant-compute/src/from_json.rs: ## @@ -135,43 +68,38 @@ mod test { None, ]); let array_ref: ArrayRef = Arc::new(input); -

Re: [I] Provide a high-level Go equivalent to Python’s adbc_ingest [arrow-adbc]

2025-07-12 Thread via GitHub
Mandukhai-Alimaa commented on issue #3142: URL: https://github.com/apache/arrow-adbc/issues/3142#issuecomment-3065906585 I am working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905#discussion_r2202869809 ## parquet-variant-compute/src/variant_array.rs: ## @@ -0,0 +1,227 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] [Variant] Append complex variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7914: URL: https://github.com/apache/arrow-rs/pull/7914#discussion_r2202872795 ## parquet-variant/src/builder.rs: ## @@ -2170,4 +2251,45 @@ mod tests { let variant = Variant::try_new_with_metadata(metadata, &value).unwrap(); asse

Re: [PR] [Variant] Avoid superflous validation checks [arrow-rs]

2025-07-12 Thread via GitHub
alamb merged PR #7906: URL: https://github.com/apache/arrow-rs/pull/7906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] [Variant] Remove superfluous check when validating monotonic offsets [arrow-rs]

2025-07-12 Thread via GitHub
alamb closed issue #7900: [Variant] Remove superfluous check when validating monotonic offsets URL: https://github.com/apache/arrow-rs/issues/7900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [Variant] Avoid superflous validation checks [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7906: URL: https://github.com/apache/arrow-rs/pull/7906#issuecomment-3065969207 Thanks again @friendlymatthew and @scovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [json] Coerce primitive numbers to string faster [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7819: URL: https://github.com/apache/arrow-rs/pull/7819#issuecomment-3065981599 > The reason why I am questioning the validity of these benchmark results is because the only bench that touches the code path of the function I changed is `bench_struct_list`, so I would

Re: [I] [Variant] Add low level support for shredding and unshredding [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on issue #7715: URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-3065983509 > Basically a kernel like > > /// Unshreds Variant > fn unshred_variant(input: VariantArray) -> VariantArray { ... } FYI @zeroshade I think said the Golang implementati

Re: [PR] [Variant] VariantBuilder with VariantMetadata instead of MetadataBuilder [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7915: URL: https://github.com/apache/arrow-rs/pull/7915#issuecomment-3065983997 > Ability for a builder to wrap an existing buffer (which a variant array builder could use to pack multiple variants into the same slice of memory) I have some proposal of that API

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905#issuecomment-3065994386 Given this PR seems to block several others, I plan to merge it once CI is complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] [Release] Revisit reproducible source archive verification [arrow]

2025-07-12 Thread via GitHub
kou commented on issue #47081: URL: https://github.com/apache/arrow/issues/47081#issuecomment-3066069536 Good point. I thought that hosted GitHub Actions runner is one of trusted hardware (we build and sign the official source archive on GitHub Actions runner now) but it may be ambit

Re: [PR] GH-47081: [Release] Fix reproducible build check on macOS [arrow]

2025-07-12 Thread via GitHub
kou commented on code in PR #47082: URL: https://github.com/apache/arrow/pull/47082#discussion_r2202931986 ## dev/release/verify-release-candidate.sh: ## @@ -789,14 +789,29 @@ ensure_source_directory() { if [ ! -d "${ARROW_SOURCE_DIR}" ]; then pushd $ARROW_TMPDIR

Re: [I] [Release] Unify GitHub token related environment variables [arrow]

2025-07-12 Thread via GitHub
amoeba commented on issue #47075: URL: https://github.com/apache/arrow/issues/47075#issuecomment-3066422368 Thanks for summarizing @kou. Any thoughts @raulcd @assignUser? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Improve release/post-release workflow [arrow-nanoarrow]

2025-07-12 Thread via GitHub
paleolimbot commented on issue #660: URL: https://github.com/apache/arrow-nanoarrow/issues/660#issuecomment-3066510037 ...also make a note that the email to the announce mailing list has to be in plain text or it will be rejected! -- This is an automated message from the Apache Git Servi

Re: [PR] chore(format): fix invalid Doxygen documentation [arrow-adbc]

2025-07-12 Thread via GitHub
rouault commented on PR #3141: URL: https://github.com/apache/arrow-adbc/pull/3141#issuecomment-3066306656 > Thanks! Can you copy this change to the other copy of the header to make pre-commit happy? done -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2203045219 ## parquet-variant-compute/src/utils.rs: ## @@ -0,0 +1,48 @@ +use arrow::{ +array::{Array, ArrayRef, BinaryArray, StructArray}, +error::Result, +}; +use arrow_

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#issuecomment-3066330539 cc @scovich @alamb @samyak2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew opened a new pull request, #7921: URL: https://github.com/apache/arrow-rs/pull/7921 # Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/7895 My initial PR is getting too large so I figured it would be better to split these up.

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#discussion_r2203047347 ## parquet-variant-compute/src/shredding.rs: ## @@ -0,0 +1,364 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#discussion_r2203047475 ## parquet-variant-compute/src/shredding.rs: ## @@ -0,0 +1,364 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#discussion_r2203048421 ## parquet-variant-compute/src/shredding.rs: ## @@ -0,0 +1,349 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] chore(format): fix invalid Doxygen documentation [arrow-adbc]

2025-07-12 Thread via GitHub
lidavidm merged PR #3141: URL: https://github.com/apache/arrow-adbc/pull/3141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-47088: [CI][Dev] Fix shellcheck errors in the ci/scripts/shellcheck-integration_arrow.sh [arrow]

2025-07-12 Thread via GitHub
github-actions[bot] commented on PR #47089: URL: https://github.com/apache/arrow/pull/47089#issuecomment-306492 :warning: GitHub issue #47088 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-47088: [CI][Dev] Fix shellcheck errors in the ci/scripts/shellcheck-integration_arrow.sh [arrow]

2025-07-12 Thread via GitHub
hiroyuki-sato opened a new pull request, #47089: URL: https://github.com/apache/arrow/pull/47089 ### Rationale for this change This is the sub issue #44748. * SC2046: Quote this to prevent word splitting. * SC2086: Double quote to prevent globbing and word splitting. * SC2

Re: [I] [C++] Segfault when reading a Parquet file as a Dataset but not when read as an individual file [arrow]

2025-07-12 Thread via GitHub
pitrou commented on issue #36807: URL: https://github.com/apache/arrow/issues/36807#issuecomment-3064937418 @thisisnic Can this still be reproduced? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [Python][C++] Scanner crashing occasionally [arrow]

2025-07-12 Thread via GitHub
pitrou commented on issue #45017: URL: https://github.com/apache/arrow/issues/45017#issuecomment-3064942035 I'm curious, why do you think this is due to datasets? The traceback mostly shows RAY functions and, weirdly, some calls to `_PyThread_at_fork_reinit` and `pthread_exit`. -- This i

Re: [I] Ability to chunk download from object store [arrow-rs-object-store]

2025-07-12 Thread via GitHub
flaneur2020 commented on issue #274: URL: https://github.com/apache/arrow-rs-object-store/issues/274#issuecomment-3064796173 [SlateDB](https://github.com/slatedb/slatedb) currently uses a transparent [object_store wrapper](https://github.com/slatedb/slatedb/blob/main/slatedb/src/cached_obj

Re: [I] Unable to import arrow table to pandas if it has categorical columns with index types of unsigned ints [arrow]

2025-07-12 Thread via GitHub
antonioalegria commented on issue #47022: URL: https://github.com/apache/arrow/issues/47022#issuecomment-3064989184 Same issue, seems like this started in pyarrow 20.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [Variant] Add `VariantBuilder::new_with_buffers` to write to existing buffers [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7912: URL: https://github.com/apache/arrow-rs/pull/7912#discussion_r2202887789 ## parquet-variant/src/builder.rs: ## @@ -1916,6 +1989,80 @@ mod tests { assert_eq!(metadata.num_field_names(), 3); } +/// Test reusing buffers with

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb merged PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905#issuecomment-3066009201 I am happy to make PRs / address comments from anyone who might not have had a chance to comment -- just let me know -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Add `VariantArray` and `VariantArrayBuilder` for constructing Arrow Arrays of Variants [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7905: URL: https://github.com/apache/arrow-rs/pull/7905#issuecomment-3066008478 Thanks again for the quick reviews everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [Variant] VariantBuilder with VariantMetadata instead of MetadataBuilder [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7915: URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2202881293 ## parquet-variant/src/builder.rs: ## @@ -350,14 +378,59 @@ impl> FromIterator for MetadataBuilder { } } -impl> Extend for MetadataBuilder { +impl> Extend for De

Re: [PR] GH-47081: [Release] Fix reproducible build check on macOS [arrow]

2025-07-12 Thread via GitHub
kou commented on PR #47082: URL: https://github.com/apache/arrow/pull/47082#issuecomment-3066097367 @github-actions crossbow submit -g verify-rc-source --param release=21.0.0 --param rc=6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-47081: [Release] Verify reproducible source build explicitly [arrow]

2025-07-12 Thread via GitHub
github-actions[bot] commented on PR #47082: URL: https://github.com/apache/arrow/pull/47082#issuecomment-3066099907 Revision: 868b13251f4850e9c63a0c562116ca1f66a21235 Submitted crossbow builds: [ursacomputing/crossbow @ actions-45858bad0e](https://github.com/ursacomputing/crossbow/bra

Re: [PR] feat(rust/core): add function to load driver manifests [arrow-adbc]

2025-07-12 Thread via GitHub
zeroshade merged PR #3099: URL: https://github.com/apache/arrow-adbc/pull/3099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.

[PR] chore(rust): bump the arrow-datafusion group across 1 directory with 2 updates [arrow-adbc]

2025-07-12 Thread via GitHub
dependabot[bot] opened a new pull request, #3144: URL: https://github.com/apache/arrow-adbc/pull/3144 Bumps the arrow-datafusion group with 2 updates in the /rust directory: [datafusion](https://github.com/apache/datafusion) and [datafusion-substrait](https://github.com/apache/datafusion).

Re: [PR] chore(rust): bump the arrow-datafusion group across 1 directory with 4 updates [arrow-adbc]

2025-07-12 Thread via GitHub
dependabot[bot] closed pull request #3111: chore(rust): bump the arrow-datafusion group across 1 directory with 4 updates URL: https://github.com/apache/arrow-adbc/pull/3111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] chore(rust): bump the arrow-datafusion group across 1 directory with 4 updates [arrow-adbc]

2025-07-12 Thread via GitHub
dependabot[bot] commented on PR #3111: URL: https://github.com/apache/arrow-adbc/pull/3111#issuecomment-3066112465 Looks like these dependencies are updatable in another way, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2203043113 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -168,7 +166,105 @@ impl VariantArrayBuilder { self.value_buffer.extend_from_slice(value);

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2203043166 ## parquet-variant/src/builder.rs: ## @@ -61,9 +61,21 @@ fn write_offset(buf: &mut Vec, value: usize, nbytes: u8) { buf.extend_from_slice(&bytes[..nbytes as usiz

Re: [PR] GH-47045: [CI][C++] Use Fedora 42 instead of 39 [arrow]

2025-07-12 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #47046: URL: https://github.com/apache/arrow/pull/47046#issuecomment-3066611578 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit bd24668c455fa0cb31caa31a9dc754d77a244250. There were no

Re: [PR] GH-47005: [C++] Disable exporting CMake packages [arrow]

2025-07-12 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #47006: URL: https://github.com/apache/arrow/pull/47006#issuecomment-3066612582 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit f52d81bca789213136f2e931b3d0809db0cc611e. There were no

Re: [PR] fix(go/adbc/drivermgr): properly vendor toml++ [arrow-adbc]

2025-07-12 Thread via GitHub
lidavidm merged PR #3138: URL: https://github.com/apache/arrow-adbc/pull/3138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202913815 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -168,7 +166,105 @@ impl VariantArrayBuilder { self.value_buffer.extend_from_slice(value); }

Re: [PR] GH-47081: [Release] Verify reproducible source build explicitly [arrow]

2025-07-12 Thread via GitHub
kou commented on PR #47082: URL: https://github.com/apache/arrow/pull/47082#issuecomment-3066114688 @github-actions crossbow submit -g verify-rc-source --param release=21.0.0 --param rc=6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-47081: [Release] Verify reproducible source build explicitly [arrow]

2025-07-12 Thread via GitHub
github-actions[bot] commented on PR #47082: URL: https://github.com/apache/arrow/pull/47082#issuecomment-3066117192 Revision: d8c6cc5ebc8d010c86a69ae9ab2dbeb8ac41682a Submitted crossbow builds: [ursacomputing/crossbow @ actions-47ebd1ec36](https://github.com/ursacomputing/crossbow/bra

Re: [I] [CI][C++] Use Fedora 42 instead of 39 [arrow]

2025-07-12 Thread via GitHub
kou commented on issue #47045: URL: https://github.com/apache/arrow/issues/47045#issuecomment-3066117405 Issue resolved by pull request 47046 https://github.com/apache/arrow/pull/47046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-47005: [C++] Disable exporting CMake packages [arrow]

2025-07-12 Thread via GitHub
kou commented on PR #47006: URL: https://github.com/apache/arrow/pull/47006#issuecomment-3066117542 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [I] [C++] Disable exporting CMake packages [arrow]

2025-07-12 Thread via GitHub
kou commented on issue #47005: URL: https://github.com/apache/arrow/issues/47005#issuecomment-3066117694 Issue resolved by pull request 47006 https://github.com/apache/arrow/pull/47006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-47045: [CI][C++] Use Fedora 42 instead of 39 [arrow]

2025-07-12 Thread via GitHub
kou commented on PR #47046: URL: https://github.com/apache/arrow/pull/47046#issuecomment-3066117228 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] GH-47045: [CI][C++] Use Fedora 42 instead of 39 [arrow]

2025-07-12 Thread via GitHub
kou merged PR #47046: URL: https://github.com/apache/arrow/pull/47046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-47005: [C++] Disable exporting CMake packages [arrow]

2025-07-12 Thread via GitHub
kou merged PR #47006: URL: https://github.com/apache/arrow/pull/47006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] [Variant] Support appending complex variants in `VariantBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7914: URL: https://github.com/apache/arrow-rs/pull/7914#discussion_r2202823069 ## parquet-variant/src/builder.rs: ## @@ -213,14 +215,14 @@ impl ValueBuffer { Variant::Binary(v) => self.append_binary(v), Variant::String(s

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#discussion_r2202958731 ## parquet-variant-compute/src/variant_array_builder.rs: ## @@ -168,7 +166,105 @@ impl VariantArrayBuilder { self.value_buffer.extend_from_slice(value);

Re: [PR] Convert JSON to VariantArray without copying [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on PR #7911: URL: https://github.com/apache/arrow-rs/pull/7911#issuecomment-3066018197 > My only concern is whether we might ever need to support a builder that isn't backed by Vec? I'm guessing not, but wanted to double check. I think eventually we might, but I think

Re: [PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2202901396 ## parquet-variant-compute/src/variant_get.rs: ## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ +array::{ +Array, ArrayRef, ArrowPrimitiveType, Bina

Re: [PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
Samyak2 commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2203142467 ## parquet-variant-compute/src/variant_get.rs: ## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ +array::{ +Array, ArrayRef, ArrowPrimitiveType, Bi

Re: [PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
Samyak2 commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2203143325 ## parquet-variant-compute/src/variant_get.rs: ## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ +array::{ +Array, ArrayRef, ArrowPrimitiveType, Bi

Re: [PR] variant_get compute kernel [arrow-rs]

2025-07-12 Thread via GitHub
Samyak2 commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2203142685 ## parquet-variant-compute/src/utils.rs: ## @@ -0,0 +1,48 @@ +use arrow::{ +array::{Array, ArrayRef, BinaryArray, StructArray}, +error::Result, +}; +use arrow_

Re: [PR] [Variant] test: add variant object tests with different sizes [arrow-rs]

2025-07-12 Thread via GitHub
alamb commented on code in PR #7896: URL: https://github.com/apache/arrow-rs/pull/7896#discussion_r2202886096 ## parquet-variant/src/variant/object.rs: ## @@ -618,4 +620,112 @@ mod tests { ArrowError::InvalidArgumentError(ref msg) if msg.contains("Tried to extract

Re: [PR] [Variant] VariantBuilder with VariantMetadata instead of MetadataBuilder [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7915: URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997 ## parquet-variant/src/builder.rs: ## @@ -350,14 +378,59 @@ impl> FromIterator for MetadataBuilder { } } -impl> Extend for MetadataBuilder { +impl> Extend for

Re: [PR] [Variant] VariantBuilder with VariantMetadata instead of MetadataBuilder [arrow-rs]

2025-07-12 Thread via GitHub
scovich commented on code in PR #7915: URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2202981997 ## parquet-variant/src/builder.rs: ## @@ -350,14 +378,59 @@ impl> FromIterator for MetadataBuilder { } } -impl> Extend for MetadataBuilder { +impl> Extend for

Re: [I] [Release] Unify GitHub token related environment variables [arrow]

2025-07-12 Thread via GitHub
kou commented on issue #47075: URL: https://github.com/apache/arrow/issues/47075#issuecomment-3066180155 OK. Summary: * We keep using `GH_TOKEN` in `dev/release/.env{,.example}` * We keep using `ARROW_GITHUB_API_TOKEN` in `dev/merge_arrow_pr.py` * Archery accepts `ARROW_GITHUB_AP

Re: [PR] GH-47081: [Release] Verify reproducible source build explicitly [arrow]

2025-07-12 Thread via GitHub
kou commented on PR #47082: URL: https://github.com/apache/arrow/pull/47082#issuecomment-3066186243 I've changed to use the explicit `TEST_SOURCE_REPRODUCIBLE` approach based on the https://github.com/apache/arrow/issues/47081#issuecomment-3064062388 discussion. -- This is an automated m

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#discussion_r2203048709 ## parquet-variant-compute/src/shredding.rs: ## @@ -0,0 +1,349 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] [Variant] Define shredding schema for `VariantArrayBuilder` [arrow-rs]

2025-07-12 Thread via GitHub
friendlymatthew commented on code in PR #7921: URL: https://github.com/apache/arrow-rs/pull/7921#discussion_r2203048613 ## parquet-variant-compute/src/shredding.rs: ## @@ -0,0 +1,349 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [I] [Release] Revisit reproducible source archive verification [arrow]

2025-07-12 Thread via GitHub
amoeba commented on issue #47081: URL: https://github.com/apache/arrow/issues/47081#issuecomment-3066352793 Sounds good. Thanks @kou. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   >