Re: [PR] MINOR: add zanmato1984 and ZhangHuiGui in collaborators list [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41544: URL: https://github.com/apache/arrow/pull/41544#issuecomment-2095281126 > As reminder the collaborators list is limited to 10 entries. Aha it already has 7 coolaborators now...I'll let the community decide whether we should add them -- This is an

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-05-06 Thread via GitHub
lidavidm commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2095299693 The ADBC encoding allows the producer to mark statistics as exact/approximate, fwiw > In (1) https://github.com/apache/arrow/issues/38837#issuecomment-2088101530 , would

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-05-06 Thread via GitHub
pitrou commented on PR #41335: URL: https://github.com/apache/arrow/pull/41335#issuecomment-2095424705 I don't mind either approach as long as it doesn't add an environment variable to avoid crashes. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] MINOR: add jbonofre in collaborators list [arrow]

2024-05-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41528: URL: https://github.com/apache/arrow/pull/41528#issuecomment-2095323003 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 4cf44b4bc3ab053b03c937d3327d43c105790462. There were 4

Re: [PR] GH-40282: [Python] Use C++ type traits [arrow]

2024-05-06 Thread via GitHub
AlenkaF commented on PR #40761: URL: https://github.com/apache/arrow/pull/40761#issuecomment-2095417100 Hi @llama90, added a comment - hope it will help to get this PR moving forward. The code also needs a rebase. -- This is an automated message from the Apache Git Service. To respond

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
AlenkaF commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095474115 I can reproduce the issue locally. @wgtmac @pitrou would you have time and idea on what could be the issue that both `dataset.to_table()` or `dataset.to_batches()` returns the

Re: [PR] Add support for flexible column lengths [arrow-rs]

2024-05-06 Thread via GitHub
Jefffrey commented on code in PR #5679: URL: https://github.com/apache/arrow-rs/pull/5679#discussion_r1590730608 ## arrow-csv/src/reader/records.rs: ## @@ -172,6 +188,11 @@ impl RecordDecoder { self.num_rows = 0; } +/// Sets the decoder to allow rows with

Re: [PR] Add support for flexible column lengths [arrow-rs]

2024-05-06 Thread via GitHub
Jefffrey commented on code in PR #5679: URL: https://github.com/apache/arrow-rs/pull/5679#discussion_r1590735760 ## arrow-csv/src/reader/mod.rs: ## @@ -265,6 +266,14 @@ impl Format { self } +/// Whether to allow flexible lengths for records. Review Comment:

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2095275677 Nice, I mean should "distinct count" ( or ndv ) be "estimated"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] [Java] Question about new driver [arrow-adbc]

2024-05-06 Thread via GitHub
lidavidm commented on issue #1822: URL: https://github.com/apache/arrow-adbc/issues/1822#issuecomment-2095315365 For the time being, Java can only use drivers written in Java unfortunately. (Also, it appears Athena is row-oriented, so I'm not sure you could expect much speedup?) --

Re: [I] [object_store] Self-signed certificates used by OneLake (Azure) [arrow-rs]

2024-05-06 Thread via GitHub
martroben commented on issue #5696: URL: https://github.com/apache/arrow-rs/issues/5696#issuecomment-2095350410 @hnasrullakhan, yes same for me. deltalake-0.16.1 works, but versions starting from 0.16.2 don't. I also included this new information under [deltalake issue

Re: [I] [Java] Question about new driver [arrow-adbc]

2024-05-06 Thread via GitHub
HaoXuAI commented on issue #1822: URL: https://github.com/apache/arrow-adbc/issues/1822#issuecomment-2095376184 Athena can use `parquet` as storage format, so it should work with arrow column format? I'll look deeper into it see if that could work. I'm not familiar with JNI, but intested

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #40998: URL: https://github.com/apache/arrow/pull/40998#issuecomment-2095375179 @pitrou do you have some comment on this? If no negative comment, I'll merge this patch in 3days -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] GH-40282: [Python] Use C++ type traits [arrow]

2024-05-06 Thread via GitHub
AlenkaF commented on code in PR #40761: URL: https://github.com/apache/arrow/pull/40761#discussion_r1590675494 ## python/pyarrow/types.py: ## @@ -20,295 +20,54 @@ from pyarrow.lib import (is_boolean_value, # noqa is_integer_value, -

Re: [PR] feat(go/adbc/driver/snowflake): support parameter binding [arrow-adbc]

2024-05-06 Thread via GitHub
lidavidm commented on PR #1808: URL: https://github.com/apache/arrow-adbc/pull/1808#issuecomment-2095311755 @zeroshade can I just get a final look here once you're up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [Java] Java Dataset API ScanOptions expansion [arrow]

2024-05-06 Thread via GitHub
jinchengchenghh commented on issue #28866: URL: https://github.com/apache/arrow/issues/28866#issuecomment-2095354665 I also have the same requirement. I use dataset scan CSV, but cannot set the parseOptions and readOptions by java ScanOptions, I find ScanBuilder can set

[PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU opened a new pull request, #41546: URL: https://github.com/apache/arrow/pull/41546 ### Rationale for this change See https://github.com/apache/arrow/issues/41545 ### What changes are included in this PR? Remove `encoded_size_` and uses

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2095410053 :warning: GitHub issue #41545 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590752421 ## cpp/src/parquet/encoding.cc: ## @@ -2740,13 +2740,12 @@ class DeltaLengthByteArrayEncoder : public EncoderImpl, : EncoderImpl(descr,

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590766609 ## cpp/src/parquet/encoding.cc: ## @@ -2803,15 +2806,15 @@ void DeltaLengthByteArrayEncoder::Put(const T* src, int num_values) { const int batch_size =

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590767517 ## cpp/src/parquet/encoding.cc: ## @@ -2803,15 +2806,15 @@ void DeltaLengthByteArrayEncoder::Put(const T* src, int num_values) { const int batch_size =

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095564429 ``` def test_encrypted_parquet_dataset(): source_enc_parquet = "./test.enc.parquet" crypt_factory = pe.CryptoFactory(kms_client_factory) encryption_config =

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2096173695 The error does not appear from `print(dataset.count_rows())` and it successfully prints `18` on my end. The issue comes when the code snippet tries to create the reader again from

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-05-06 Thread via GitHub
pdet commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2095548445 Exactly, In general, systems don't provide exact distinct statistics because these are rather expensive to calculate. Hence, they do some approximate strategies. In DuckDB's

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
RyogaWan commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095591833 > ``` > def test_encrypted_parquet_dataset(): > source_enc_parquet = "./test.enc.parquet" > crypt_factory = pe.CryptoFactory(kms_client_factory) >

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
rok commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2095676239 The Appveyor issue appears on master as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2096082903 > Hmm... what is estimated exactly? Currently encoder estimated "the values size after encoding but not compressed". When using DeltaLengthByteArray, we use a smaller

Re: [PR] MINOR: [C++][Parquet] fix dict_length for ReadDictionary when not having dict [arrow]

2024-05-06 Thread via GitHub
pitrou commented on PR #41344: URL: https://github.com/apache/arrow/pull/41344#issuecomment-2096122116 It would have been nice to add a test for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Fix Rustdocs (amd64, nightly)" CI check [arrow-rs]

2024-05-06 Thread via GitHub
alamb opened a new pull request, #5727: URL: https://github.com/apache/arrow-rs/pull/5727 # Which issue does this PR close? Closes https://github.com/apache/arrow-rs/issues/5725 # Rationale for this change # What changes are included in this PR? #

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
rok commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590748410 ## cpp/src/parquet/encoding.cc: ## @@ -2740,13 +2740,12 @@ class DeltaLengthByteArrayEncoder : public EncoderImpl, : EncoderImpl(descr,

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
rok commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590776471 ## cpp/src/parquet/encoding.cc: ## @@ -2740,13 +2740,12 @@ class DeltaLengthByteArrayEncoder : public EncoderImpl, : EncoderImpl(descr,

Re: [I] "Archery test With other arrows" integration tests are failing [arrow-rs]

2024-05-06 Thread via GitHub
Jefffrey commented on issue #5719: URL: https://github.com/apache/arrow-rs/issues/5719#issuecomment-2095790573 Seems to be an issue with the `apache/arrow-dev:amd64-conda-integration` Docker image, specifically with conda? Relevant discussions: -

Re: [I] [CI][C++] RapidJSON related build failure on AppVeyor [arrow]

2024-05-06 Thread via GitHub
xhochy commented on issue #41509: URL: https://github.com/apache/arrow/issues/41509#issuecomment-2095881339 https://github.com/conda-forge/rapidjson-feedstock/pull/9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [Java] Java Dataset API ScanOptions expansion [arrow]

2024-05-06 Thread via GitHub
westonpace commented on issue #28866: URL: https://github.com/apache/arrow/issues/28866#issuecomment-2095929942 Is the JNI -> dataset API using substrait today? If so, then I think 1 is preferred. However, the user should not provide `AdvancedExtension`, this should be taken care of

Re: [PR] MINOR: [R] fix no visible global function definition: left_join [arrow]

2024-05-06 Thread via GitHub
nealrichardson merged PR #41542: URL: https://github.com/apache/arrow/pull/41542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on code in PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#discussion_r1591039099 ## arrow-data/src/byte_view.rs: ## @@ -15,10 +15,453 @@ // specific language governing permissions and limitations // under the License. -use arrow_buffer::Buffer;

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2096069246 Sure, I can add it for this case(DeltaLengthByteArrayEncoder statistics) . However these value is "estimated", we use this during set a page-size-limit and check if limit exceeds. How

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

2024-05-06 Thread via GitHub
pitrou commented on code in PR #41187: URL: https://github.com/apache/arrow/pull/41187#discussion_r1591080513 ## docs/source/cpp/parquet.rst: ## @@ -542,6 +542,19 @@ As an example, when serializing an Arrow LargeList to Parquet: :func:`ArrowWriterProperties::store_schema`

Re: [PR] GH-41035: [C++] Add a grouper benchmark for preventing performance regression [arrow]

2024-05-06 Thread via GitHub
pitrou commented on PR #41036: URL: https://github.com/apache/arrow/pull/41036#issuecomment-2096160926 @ZhangHuiGui I have lost track of what this PR's status is. Does it need to wait for other PRs? Do you intend to bring further changes to it? -- This is an automated message from the

Re: [PR] Fix `GenericListBuilder` test typo [arrow-rs]

2024-05-06 Thread via GitHub
Jefffrey merged PR #5724: URL: https://github.com/apache/arrow-rs/pull/5724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [CI][C++] RapidJSON related build failure on AppVeyor [arrow]

2024-05-06 Thread via GitHub
xhochy commented on issue #41509: URL: https://github.com/apache/arrow/issues/41509#issuecomment-2095868288 I'll update the conda-forge package with the patch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
ariesdevil commented on code in PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#discussion_r1591029570 ## arrow-data/src/byte_view.rs: ## @@ -15,10 +15,453 @@ // specific language governing permissions and limitations // under the License. -use

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
pitrou commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2096072831 Hmm... what is estimated exactly? Are we talking about the statistics written to the Parquet file, or about something else? -- This is an automated message from the Apache Git Service.

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2096073066 I can add test for ByteArray + Non-Dict encoding as a first implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
ariesdevil commented on code in PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#discussion_r1591031824 ## arrow-data/src/byte_view.rs: ## @@ -15,10 +15,453 @@ // specific language governing permissions and limitations // under the License. -use

Re: [I] "Rustdocs are clean (amd64, nightly)" CI check is failing [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on issue #5725: URL: https://github.com/apache/arrow-rs/issues/5725#issuecomment-2096085735 Thanks you for filing this @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#issuecomment-2096087347 CI is failing due to https://github.com/apache/arrow-rs/issues/5725 and https://github.com/apache/arrow-rs/issues/5719 But otherwise I think this PR is ready for review -- This

[PR] Update brotli requirement from 5.0 to 6.0 [arrow-rs]

2024-05-06 Thread via GitHub
dependabot[bot] opened a new pull request, #5726: URL: https://github.com/apache/arrow-rs/pull/5726 Updates the requirements on [brotli](https://github.com/dropbox/rust-brotli) to permit the latest version. Commits

Re: [PR] GH-39798: [C++] Optimize Take for fixed-size types including nested fixed-size lists [arrow]

2024-05-06 Thread via GitHub
pitrou commented on code in PR #41297: URL: https://github.com/apache/arrow/pull/41297#discussion_r1591058749 ## cpp/src/arrow/util/fixed_width_internal.h: ## @@ -0,0 +1,307 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095535644 From the message `OSError: RowGroup is noted as encrypted but no file decryptor`, it seems that the decryption config is not correctly created. I'll take some time to investigate

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
rok commented on code in PR #41546: URL: https://github.com/apache/arrow/pull/41546#discussion_r1590752598 ## cpp/src/parquet/encoding.cc: ## @@ -2768,6 +2767,10 @@ class DeltaLengthByteArrayEncoder : public EncoderImpl, return Status::Invalid(

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on code in PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#discussion_r1591006739 ## arrow-data/src/transform/mod.rs: ## @@ -178,13 +178,17 @@ fn build_extend_view(array: , buffer_offset: u32) -> Extend { mutable

Re: [PR] [C++] Thirdparty: Upgrade xsimd to 13.0.0 [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41548: URL: https://github.com/apache/arrow/pull/41548#issuecomment-2096062849 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[PR] [C++] Thirdparty: Upgrade xsimd to 13.0.0 [arrow]

2024-05-06 Thread via GitHub
mapleFU opened a new pull request, #41548: URL: https://github.com/apache/arrow/pull/41548 ### Rationale for this change Arrow now uses xsimd 9.0.1, currently, some conversion for batch is now support in neon, see:

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
tolleybot commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2096090429 > From the message `OSError: RowGroup is noted as encrypted but no file decryptor`, it seems that the decryption config is not correctly created. I'll take some time to

Re: [I] Implement `StringViewArray` and `BinaryViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on issue #5374: URL: https://github.com/apache/arrow-rs/issues/5374#issuecomment-2096088717 BTW if anyone is interested, I have a proposal for how to improve manipulating the views in Rust https://github.com/apache/arrow-rs/pull/5619 (I really enjoy the rust type system in

Re: [PR] GH-41193: [C++] Remove some unnecessary function (FromColumnMetadataVector) calls in swiss_join [arrow]

2024-05-06 Thread via GitHub
pitrou commented on code in PR #41194: URL: https://github.com/apache/arrow/pull/41194#discussion_r1591091462 ## cpp/src/arrow/compute/row/row_internal.h: ## @@ -165,7 +165,7 @@ class ARROW_EXPORT RowTableImpl { /// \brief Initialize a row array for use /// /// This

Re: [PR] GH-40282: [Python] Use C++ type traits [arrow]

2024-05-06 Thread via GitHub
llama90 commented on code in PR #40761: URL: https://github.com/apache/arrow/pull/40761#discussion_r1591109342 ## python/pyarrow/includes/libarrow.pxd: ## @@ -3026,3 +3023,34 @@ cdef extern from "arrow/python/udf.h" namespace "arrow::py" nogil: cdef extern from

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095600750 You may use `FOOTER_KEY_NAME` as the key name, but the column names should be wrapped in the list. And please also check if the column name (i.e. rb.schema.names) is correct. --

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-05-06 Thread via GitHub
pitrou commented on code in PR #40998: URL: https://github.com/apache/arrow/pull/40998#discussion_r1590947251 ## cpp/src/arrow/compute/row/compare_internal.h: ## @@ -92,7 +98,8 @@ class ARROW_EXPORT KeyCompare { static uint32_t NullUpdateColumnToRowImp_avx2( uint32_t

Re: [PR] GH-41547: [C++] Thirdparty: Upgrade xsimd to 13.0.0 [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41548: URL: https://github.com/apache/arrow/pull/41548#issuecomment-2096071062 :warning: GitHub issue #41547 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Encapsulate `View` logic for `GenericByteViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on code in PR #5619: URL: https://github.com/apache/arrow-rs/pull/5619#discussion_r1591046681 ## arrow-data/src/byte_view.rs: ## @@ -15,10 +15,453 @@ // specific language governing permissions and limitations // under the License. -use arrow_buffer::Buffer;

Re: [I] [Python][Dataset] to_batches crash after calling 'count_rows' using dataset to read encrypted parquet [arrow]

2024-05-06 Thread via GitHub
RyogaWan commented on issue #41431: URL: https://github.com/apache/arrow/issues/41431#issuecomment-2095608278 > You may use `FOOTER_KEY_NAME` as the key name, but the column names should be wrapped in the list. And please also check if the column name (i.e. rb.schema.names) is correct.

[I] "Rustdocs are clean (amd64, nightly)" CI check is failing [arrow-rs]

2024-05-06 Thread via GitHub
Jefffrey opened a new issue, #5725: URL: https://github.com/apache/arrow-rs/issues/5725 **Describe the bug** CI failing on unrelated PRs: https://github.com/apache/arrow-rs/actions/runs/8956610392/job/24627590071?pr=5679 ``` Checking parquet v51.0.0

Re: [PR] GH-41545: [C++][Parquet] DeltaLengthByteArrayEncoder add estimated size when Put(Array) [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41546: URL: https://github.com/apache/arrow/pull/41546#issuecomment-2096441928 @pitrou I add test just in the "TEST({DeltaLengthByte|Plain}ArrayEncodingAdHoc, ArrowBinaryDirectPut)" case -- This is an automated message from the Apache Git Service. To respond to

[PR] MINOR: [Java] Bump com.google.api.grpc:proto-google-common-protos from 2.37.1 to 2.39.0 in /java [arrow]

2024-05-06 Thread via GitHub
dependabot[bot] opened a new pull request, #41555: URL: https://github.com/apache/arrow/pull/41555 Bumps [com.google.api.grpc:proto-google-common-protos](https://github.com/googleapis/sdk-platform-java) from 2.37.1 to 2.39.0. Release notes Sourced from

[I] Use File Format quote when inferring the schema for CSVFormat [arrow-rs]

2024-05-06 Thread via GitHub
joao-p-pereira opened a new issue, #5729: URL: https://github.com/apache/arrow-rs/issues/5729 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When using ListingOptions to infer the schema of a ListingTableUrl the result

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591385343 ## cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc: ## @@ -324,261 +326,109 @@ namespace { using TakeState = OptionsWrapper; //

Re: [PR] GH-41547: [C++] Thirdparty: Upgrade xsimd to 13.0.0 [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41548: URL: https://github.com/apache/arrow/pull/41548#issuecomment-2096645750 @github-actions crossbow submit -g cpp -g wheel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] FlightSQL Stateless Prepared Statements handling of `Any` protobuf messages [arrow]

2024-05-06 Thread via GitHub
zeroshade commented on issue #41556: URL: https://github.com/apache/arrow/issues/41556#issuecomment-2096706042 @matthewmturner I just want to clarify something, the response from `DoPut` is a `PutResult` message that contains a single `bytes` field named `app_metadata`. In the case of the

Re: [PR] feat(go/adbc/driver/snowflake): support parameter binding [arrow-adbc]

2024-05-06 Thread via GitHub
zeroshade commented on code in PR #1808: URL: https://github.com/apache/arrow-adbc/pull/1808#discussion_r1591109804 ## go/adbc/driver/snowflake/binding.go: ## @@ -0,0 +1,141 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
pitrou commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591132130 ## cpp/src/arrow/util/gather_internal.h: ## @@ -0,0 +1,287 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] Fix Rustdocs (amd64, nightly)" CI check [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on code in PR #5727: URL: https://github.com/apache/arrow-rs/pull/5727#discussion_r1591169135 ## arrow-buffer/src/alloc/alignment.rs: ## @@ -80,15 +80,6 @@ pub const ALIGNMENT: usize = 1 << 5; #[cfg(target_arch = "sparc64")] pub const ALIGNMENT: usize = 1 <<

Re: [PR] Support casting `StringView`/`BinaryView` --> `StringArray`/`BinaryArray`. [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on PR #5704: URL: https://github.com/apache/arrow-rs/pull/5704#issuecomment-2096291283 Archery test is failing via https://github.com/apache/arrow-rs/issues/5719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] "Rustdocs are clean (amd64, nightly)" CI check is failing [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on issue #5725: URL: https://github.com/apache/arrow-rs/issues/5725#issuecomment-2096289967 Proposed PR to fix: https://github.com/apache/arrow-rs/pull/5727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] feat(python): Add copy_into() to CBufferView [arrow-nanoarrow]

2024-05-06 Thread via GitHub
paleolimbot opened a new pull request, #455: URL: https://github.com/apache/arrow-nanoarrow/pull/455 This is the non-bitmap equivalent of #450, useful for the same purpose (concatenating one big data buffer from chunks). -- This is an automated message from the Apache Git Service. To

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591376011 ## cpp/src/arrow/util/gather_internal.h: ## @@ -0,0 +1,287 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591395174 ## cpp/src/arrow/util/gather_internal.h: ## @@ -0,0 +1,287 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-06 Thread via GitHub
zeroshade commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1591178649 ## cpp/src/arrow/array/data.cc: ## @@ -224,6 +224,41 @@ int64_t ArrayData::ComputeLogicalNullCount() const { return ArraySpan(*this).ComputeLogicalNullCount(); }

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

2024-05-06 Thread via GitHub
mapleFU commented on PR #41187: URL: https://github.com/apache/arrow/pull/41187#issuecomment-2096363623 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-41343: [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on PR #41345: URL: https://github.com/apache/arrow/pull/41345#issuecomment-2096390796 I will merge this by the end of this week if there is no objection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591380457 ## cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc: ## @@ -324,261 +326,109 @@ namespace { using TakeState = OptionsWrapper; //

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591381299 ## cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc: ## @@ -324,261 +326,109 @@ namespace { using TakeState = OptionsWrapper; //

Re: [PR] GH-41547: [C++] Thirdparty: Upgrade xsimd to 13.0.0 [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41548: URL: https://github.com/apache/arrow/pull/41548#issuecomment-2096652566 Revision: 362872a4e415d600db3b5f34efb14680d000d1e4 Submitted crossbow builds: [ursacomputing/crossbow @

Re: [PR] Fix Rustdocs (amd64, nightly)" CI check [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on PR #5727: URL: https://github.com/apache/arrow-rs/pull/5727#issuecomment-2096284697 The failing test is seems to be fixed: https://github.com/apache/arrow-rs/actions/runs/8970828470/job/24635202345?pr=5727 The integration test failure is tracked via

Re: [I] Add `gc`garbage collector support for `StringViewArray` and `BinaryViewArray` [arrow-rs]

2024-05-06 Thread via GitHub
alamb commented on issue #5513: URL: https://github.com/apache/arrow-rs/issues/5513#issuecomment-2096301581 BTW @RinChanNOWWW has implemented `StringViewArray` / `BinaryViewArray` --> `StringArray` / `BinaryArray` in https://github.com/apache/arrow-rs/pull/5704 I think the same

Re: [PR] GH-41431: [C++][Parquet][Dataset] Fix repeated scan on encrypted dataset [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41550: URL: https://github.com/apache/arrow/pull/41550#issuecomment-2096347303 :warning: GitHub issue #41431 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-41431: [C++][Parquet][Dataset] Fix repeated scan on encrypted dataset [arrow]

2024-05-06 Thread via GitHub
wgtmac commented on PR #41550: URL: https://github.com/apache/arrow/pull/41550#issuecomment-2096349580 @pitrou @jorisvandenbossche Would you mind taking a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] MINOR: [Go] Bump golang.org/x/sys from 0.19.0 to 0.20.0 in /go [arrow]

2024-05-06 Thread via GitHub
dependabot[bot] opened a new pull request, #41554: URL: https://github.com/apache/arrow/pull/41554 Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.19.0 to 0.20.0. Commits https://github.com/golang/sys/commit/7d69d983c4522784860c781a0d7b80408fdc0cd1;>7d69d98

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591368377 ## cpp/src/arrow/util/gather_internal.h: ## @@ -0,0 +1,287 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591382562 ## cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc: ## @@ -324,261 +326,109 @@ namespace { using TakeState = OptionsWrapper; //

Re: [PR] GH-39858: [C++][Device] Add Copy/View slice functions to MemoryManager [arrow]

2024-05-06 Thread via GitHub
zeroshade commented on PR #41477: URL: https://github.com/apache/arrow/pull/41477#issuecomment-2096211969 That's my understanding. @felipecrv that sound about right to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-06 Thread via GitHub
zeroshade commented on PR #40807: URL: https://github.com/apache/arrow/pull/40807#issuecomment-2096303239 @jorisvandenbossche @pitrou any further comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #41187: URL: https://github.com/apache/arrow/pull/41187#issuecomment-2096372219 Revision: 946a4bf202f2f591ec014958da0d255a5e7d Submitted crossbow builds: [ursacomputing/crossbow @

[PR] Refactor to share code between do_put and do_exchange calls [arrow-rs]

2024-05-06 Thread via GitHub
opensourcegeek opened a new pull request, #5728: URL: https://github.com/apache/arrow-rs/pull/5728 # Which issue does this PR close? Extends on work done for #3462 - internal refactor # Rationale for this change Internal refactor to allow sharing of the

Re: [PR] MINOR: [R] fix no visible global function definition: left_join [arrow]

2024-05-06 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41542: URL: https://github.com/apache/arrow/pull/41542#issuecomment-2096496202 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit d10ebf055a393c94a693097db1dca08ff86745bd. There were

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on PR #41373: URL: https://github.com/apache/arrow/pull/41373#issuecomment-2096599183 > I'm skeptical that you want to reuse this for Filter, unless you add Gather methods for batch selection. For Filter performance, it is essential to write out ranges of selected

Re: [PR] MINOR: [Java] Reduce enum array allocation [arrow]

2024-05-06 Thread via GitHub
laurentgo commented on PR #41533: URL: https://github.com/apache/arrow/pull/41533#issuecomment-2096607451 > We may want to optimize body of the noneMatch() call to use a case-insensitive TreeSet if this is called frequently (instead of searching linearly for each key as we're currently

Re: [PR] GH-41301: [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type [arrow]

2024-05-06 Thread via GitHub
felipecrv commented on code in PR #41373: URL: https://github.com/apache/arrow/pull/41373#discussion_r1591414022 ## cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc: ## @@ -324,261 +326,109 @@ namespace { using TakeState = OptionsWrapper; //

Re: [I] Add `.arc()` method for types that are typically used as *Refs [arrow-rs]

2024-05-06 Thread via GitHub
paddyhoran commented on issue #5714: URL: https://github.com/apache/arrow-rs/issues/5714#issuecomment-2096213221 I think I agree with @tustvold here. I would be someone that theoretically would benefit from this but have never missed it. Also, there is a solution via the helper trait.

  1   2   >