Re: [I] Coerce parquet int96 timestamps to microsecond precision [arrow-rs]

2024-04-21 Thread via GitHub
ion-elgreco commented on issue #5655: URL: https://github.com/apache/arrow-rs/issues/5655#issuecomment-2067963356 @mapleFU ![image](https://github.com/apache/arrow-rs/assets/15728914/df18e762-0063-4384-bb78-ca10c7d1ba40) https://spark.apache.org/docs/3.5.1/configuration.html#content

Re: [PR] GH-41314: [CI][Python] Add a job on ARM64 macOS [arrow]

2024-04-21 Thread via GitHub
llama90 commented on PR #41313: URL: https://github.com/apache/arrow/pull/41313#issuecomment-2067965402 Hello, @kou As you mentioned, I have created an issue for enabling ccache. Does this align with what you mentioned? Thank you. -

Re: [PR] GH-40069: [C++] Make scalar scratch space immutable after initialization [arrow]

2024-04-21 Thread via GitHub
zanmato1984 commented on PR #40237: URL: https://github.com/apache/arrow/pull/40237#issuecomment-2068522366 Hi @bkietz , shall we move on with this PR? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] GH-41327: [Ruby] Show type name in Arrow::Table#to_s [arrow]

2024-04-21 Thread via GitHub
github-actions[bot] commented on PR #41328: URL: https://github.com/apache/arrow/pull/41328#issuecomment-2068442696 :warning: GitHub issue #41327 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Provide Arrow Schema Hint to Parquet Reader [arrow-rs]

2024-04-21 Thread via GitHub
liukun4515 commented on issue #5657: URL: https://github.com/apache/arrow-rs/issues/5657#issuecomment-2068514681 > The inference logic is already setup to use the arrow schema as a hint as opposed to authoritative , if you give it something invalid it will just ignore it thanks, got

Re: [I] [C++][Parquet] Crash / heap-use-after-free in ByteArrayChunkedRecordReader::ReadValuesSpaced() on a corrupted Parquet file [arrow]

2024-04-21 Thread via GitHub
mapleFU commented on issue #41321: URL: https://github.com/apache/arrow/issues/41321#issuecomment-2068549739 What's the version of code are you using? When I read this I got "Invalid or corrupted bit_width", have you select some columns during read? -- This is an automated message from

Re: [PR] GH-41095: [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41276: URL: https://github.com/apache/arrow/pull/41276#issuecomment-2068463658 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 25bb627519f17835cb0b6ff95588f2de0cbaf7ad. There were

[PR] GH-41327: [Ruby] Show type name in Arrow::Table#to_s [arrow]

2024-04-21 Thread via GitHub
kou opened a new pull request, #41328: URL: https://github.com/apache/arrow/pull/41328 ### Rationale for this change It's useful to detect type difference. ### What changes are included in this PR? Add `:show_column_type` option to `Arrow::Table#to_s` and enables it by

Re: [PR] feat(c/driver/postgresql): add money type and test intervals [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1741: URL: https://github.com/apache/arrow-adbc/pull/1741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41112: [C++] Clean up unused parameter warnings [arrow]

2024-04-21 Thread via GitHub
kou commented on PR #4: URL: https://github.com/apache/arrow/pull/4#issuecomment-2068485191 Hmm. It seems that `[[maybe_unused]]` doesn't work g++ 8.5.0: https://github.com/ursacomputing/crossbow/actions/runs/8772764647/job/24072078525#step:6:167 ```text -- The

Re: [PR] Introduce `Compare` to support nulls comparison [arrow-rs]

2024-04-21 Thread via GitHub
tustvold commented on PR #5672: URL: https://github.com/apache/arrow-rs/pull/5672#issuecomment-2067994839 As stated on the ticket, this needs more thought as it won't work for distinct, it is on my list to work out a way to support this but I haven't had time recently -- This is an

Re: [PR] GH-41095: [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
Tom-Newton commented on code in PR #41276: URL: https://github.com/apache/arrow/pull/41276#discussion_r1573734719 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -381,6 +388,24 @@ AzureOptions::MakeDataLakeServiceClient() const { return Status::Invalid("AzureOptions doesn't

[PR] Fix integration tests by downgrading jobserver (#5673) [arrow-rs]

2024-04-21 Thread via GitHub
tustvold opened a new pull request, #5674: URL: https://github.com/apache/arrow-rs/pull/5674 # Which issue does this PR close? Closes #5673 # Rationale for this change # What changes are included in this PR? # Are there any user-facing

[PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
rouault opened a new pull request, #41320: URL: https://github.com/apache/arrow/pull/41320 ### Rationale for this change Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a corrupted Parquet file ### What changes are included in this PR? Add a

Re: [PR] Fix integration tests by downgrading jobserver (#5673) [arrow-rs]

2024-04-21 Thread via GitHub
tustvold commented on PR #5674: URL: https://github.com/apache/arrow-rs/pull/5674#issuecomment-2068072878 I'm going to merge this in to get CI green again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-41323: [R] Redo how summarize() evaluates expressions [arrow]

2024-04-21 Thread via GitHub
github-actions[bot] commented on PR #41223: URL: https://github.com/apache/arrow/pull/41223#issuecomment-2068116140 :warning: GitHub issue #41323 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] [Python] Broken unit test: Segmentation fault in test_make_write_options_error [arrow]

2024-04-21 Thread via GitHub
llama90 commented on issue #41312: URL: https://github.com/apache/arrow/issues/41312#issuecomment-2068024528 I checked a few things: I was using Python `3.10` on my environment, but the CI system was using Python `3.11`. So, I set up Python `3.11` with conda and ran the tests again.

Re: [PR] Introduce `Compare` to support nulls comparison [arrow-rs]

2024-04-21 Thread via GitHub
tustvold commented on code in PR #5672: URL: https://github.com/apache/arrow-rs/pull/5672#discussion_r1573749060 ## arrow-ord/src/sort.rs: ## @@ -725,16 +726,19 @@ impl LexicographicalComparator { None => (true, true), }; +// TODO:

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
github-actions[bot] commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068042898 :warning: GitHub issue #41317 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068064220 Digging further, seeing that FileReaderImpl::DecodeRowGroups() already calls Table::Validate(), but that GetRecordBatchReader() didn't, I've also tested successfully the following

Re: [I] Dyn Comparison of Nested Arrays [arrow-rs]

2024-04-21 Thread via GitHub
tustvold commented on issue #5426: URL: https://github.com/apache/arrow-rs/issues/5426#issuecomment-2068038015 Yeah, I can't remember why I decided this didn't work :sweat_smile: I guess the proof will be to wire up a draft implementation showing it working, and then we'll know for

Re: [PR] Introduce `Compare` to support nulls comparison [arrow-rs]

2024-04-21 Thread via GitHub
tustvold commented on code in PR #5672: URL: https://github.com/apache/arrow-rs/pull/5672#discussion_r1573748425 ## arrow-ord/src/ord.rs: ## @@ -24,33 +24,107 @@ use arrow_buffer::ArrowNativeType; use arrow_schema::ArrowError; use std::cmp::Ordering; +#[derive(Debug,

[I] Integration Tests Failing Due To Old GLibc [arrow-rs]

2024-04-21 Thread via GitHub
tustvold opened a new issue, #5673: URL: https://github.com/apache/arrow-rs/issues/5673 **Describe the bug** Integration tests are currently failing to link correctly https://github.com/apache/arrow-rs/actions/runs/8736525083/job/24072347973 This appears to be caused

Re: [I] [Python] Broken unit test: Segmentation fault in test_make_write_options_error [arrow]

2024-04-21 Thread via GitHub
llama90 commented on issue #41312: URL: https://github.com/apache/arrow/issues/41312#issuecomment-2067990519 I tested only the `test_make_write_options_error` as a test case in CI, and there were no issues. -

Re: [I] Dyn Comparison of Nested Arrays [arrow-rs]

2024-04-21 Thread via GitHub
jayzhan211 commented on issue #5426: URL: https://github.com/apache/arrow-rs/issues/5426#issuecomment-2068012133 > As stated on the ticket, this needs more thought as it won't work for distinct, it is on my list to work out a way to support this but I haven't had time recently I

Re: [I] [Python] `tests/test_feather.py::test_roundtrip`: `hypothesis.errors.FailedHealthCheck: Data generation is extremely slow` […] [arrow]

2024-04-21 Thread via GitHub
mgorny commented on issue #41318: URL: https://github.com/apache/arrow/issues/41318#issuecomment-2068035759 More of them in the next run: ```pytb == FAILURES ===

Re: [PR] GH-41314: [CI][Python] Add a job on ARM64 macOS [arrow]

2024-04-21 Thread via GitHub
llama90 commented on PR #41313: URL: https://github.com/apache/arrow/pull/41313#issuecomment-2068041328 hello @kou. Separately from the issue([41312](https://github.com/apache/arrow/issues/41312)), I cleaned this PR. The issue seems to be that in CI, tests that fail during

Re: [PR] Fix integration tests by downgrading jobserver (#5673) [arrow-rs]

2024-04-21 Thread via GitHub
tustvold merged PR #5674: URL: https://github.com/apache/arrow-rs/pull/5674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Integration Tests Failing Due To Old GLibc [arrow-rs]

2024-04-21 Thread via GitHub
tustvold closed issue #5673: Integration Tests Failing Due To Old GLibc URL: https://github.com/apache/arrow-rs/issues/5673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-41323: [R] Redo how summarize() evaluates expressions [arrow]

2024-04-21 Thread via GitHub
nealrichardson commented on code in PR #41223: URL: https://github.com/apache/arrow/pull/41223#discussion_r1573849031 ## r/R/dplyr-summarize.R: ## @@ -221,25 +257,27 @@ do_arrow_summarize <- function(.data, ..., .groups = NULL) { # It's more complex than other places

Re: [PR] feat(go/adbc/driver/bigquery): add support for Google BigQuery [arrow-adbc]

2024-04-21 Thread via GitHub
cocoa-xu commented on PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#issuecomment-2068192032 Hi I've updated and implemented a bit more. Although I'm not 100% sure if this is the right/best way to do some functions... I'll be happy to make any changes. Besides that, I

Re: [PR] GH-41256: [Format][Docs] Add a canonical extension type specification for JSON [arrow]

2024-04-21 Thread via GitHub
progger-dev commented on code in PR #41257: URL: https://github.com/apache/arrow/pull/41257#discussion_r1573871660 ## docs/source/format/CanonicalExtensions.rst: ## @@ -251,6 +251,27 @@ Variable shape tensor Values inside each **data** tensor element are stored in

Re: [PR] GH-41256: [Format][Docs] Add a canonical extension type specification for JSON [arrow]

2024-04-21 Thread via GitHub
rouault commented on code in PR #41257: URL: https://github.com/apache/arrow/pull/41257#discussion_r1573896963 ## docs/source/format/CanonicalExtensions.rst: ## @@ -251,6 +251,27 @@ Variable shape tensor Values inside each **data** tensor element are stored in

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] feat(rust): add public abstract API and dummy driver implementation [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1725: URL: https://github.com/apache/arrow-adbc/pull/1725#issuecomment-2068331318 Augh, I forgot to edit out the pings from the commit message... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix(format): correct duplicated statistics names [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1736: URL: https://github.com/apache/arrow-adbc/pull/1736#issuecomment-2068341287 We don't have a Python enum for these (un)fortunately -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix(format): correct duplicated statistics names [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1736: URL: https://github.com/apache/arrow-adbc/pull/1736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] ci: fix wheel build [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm opened a new pull request, #1740: URL: https://github.com/apache/arrow-adbc/pull/1740 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] feat(go/adbc/driver/bigquery): add support for Google BigQuery [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#issuecomment-2068351876 all those TODOs are fine to split into later PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix(go/adbc/driver/snowflake): handle quotes properly [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1738: URL: https://github.com/apache/arrow-adbc/pull/1738#issuecomment-2068355191 I'm fixing CI so we can provide a wheel for testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix(go/adbc/driver/snowflake): handle quotes properly [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1738: URL: https://github.com/apache/arrow-adbc/pull/1738#issuecomment-2068360745 I'm not sure this is the way to go. One, we're basically opening things up to SQL injection attacks if we don't escape/quote input. Two, it breaks the idea that what you pass in is

Re: [PR] ci: fix wheel build [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm commented on PR #1740: URL: https://github.com/apache/arrow-adbc/pull/1740#issuecomment-2068364326 Looks like the EOF flake is because something in upstream arrow-go is propagating an EOF to us instead of checking for it properly -- This is an automated message from the Apache

Re: [PR] ci: fix wheel build [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1740: URL: https://github.com/apache/arrow-adbc/pull/1740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] feat(python): Implement numpy conversion [arrow-nanoarrow]

2024-04-21 Thread via GitHub
paleolimbot opened a new pull request, #438: URL: https://github.com/apache/arrow-nanoarrow/pull/438 Currently just an experiment. I think converting to numpy is in scope here because numpy operates at a lower level of abstraction than arrow (i.e., I'm not sure one would want/expect numpy

Re: [I] Unable to build on MacOSX [arrow]

2024-04-21 Thread via GitHub
llama90 commented on issue #41322: URL: https://github.com/apache/arrow/issues/41322#issuecomment-2068279920 Would you like to refer to this document? - https://arrow.apache.org/docs/developers/python.html#building-on-linux-and-macos -- This is an automated message from the Apache

Re: [PR] GH-41095: [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
kou commented on PR #41276: URL: https://github.com/apache/arrow/pull/41276#issuecomment-2068301091 I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-41095: [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
kou commented on code in PR #41276: URL: https://github.com/apache/arrow/pull/41276#discussion_r1573992832 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -381,6 +388,24 @@ AzureOptions::MakeDataLakeServiceClient() const { return Status::Invalid("AzureOptions doesn't contain a

Re: [I] [C++][FS][Azure] CopyFile doesn't work with Azure hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
kou commented on issue #41095: URL: https://github.com/apache/arrow/issues/41095#issuecomment-2068301387 Issue resolved by pull request 41276 https://github.com/apache/arrow/pull/41276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-41095: [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support [arrow]

2024-04-21 Thread via GitHub
kou merged PR #41276: URL: https://github.com/apache/arrow/pull/41276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41307: [Java] Use org.apache:apache parent pom version 31 [arrow]

2024-04-21 Thread via GitHub
lidavidm commented on PR #41309: URL: https://github.com/apache/arrow/pull/41309#issuecomment-2068313789 @laurentgo I remember this error from the last PR, there's something about the new build that seems to break the action: ``` Error: The template is not valid.

Re: [PR] GH-41307: [Java] Use org.apache:apache parent pom version 31 [arrow]

2024-04-21 Thread via GitHub
lidavidm commented on PR #41309: URL: https://github.com/apache/arrow/pull/41309#issuecomment-2068313301 @zeroshade This is a rather concerning CI flake ``` == Testing C ArrowArray from file 'custom_metadata'

Re: [I] Unable to build on MacOSX [arrow]

2024-04-21 Thread via GitHub
kou commented on issue #41322: URL: https://github.com/apache/arrow/issues/41322#issuecomment-2068323508 It's strange that your pip didn't use wheel. We have wheels for macOS: https://pypi.org/project/pyarrow/#files -- This is an automated message from the Apache Git Service. To

Re: [PR] ci: update PR title validity regex [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1735: URL: https://github.com/apache/arrow-adbc/pull/1735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41314: [CI][Python] Add a job on ARM64 macOS [arrow]

2024-04-21 Thread via GitHub
kou commented on PR #41313: URL: https://github.com/apache/arrow/pull/41313#issuecomment-2068304411 Yes. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-41314: [CI][Python] Add a job on ARM64 macOS [arrow]

2024-04-21 Thread via GitHub
kou commented on PR #41313: URL: https://github.com/apache/arrow/pull/41313#issuecomment-2068305384 @github-actions crossbow submit -g python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-41314: [CI][Python] Add a job on ARM64 macOS [arrow]

2024-04-21 Thread via GitHub
github-actions[bot] commented on PR #41313: URL: https://github.com/apache/arrow/pull/41313#issuecomment-2068307218 Revision: 6c7ac94ea77052ee46b3c1663f0f555a1ebf7180 Submitted crossbow builds: [ursacomputing/crossbow @

Re: [I] [Python] Broken unit test: Segmentation fault in test_make_write_options_error [arrow]

2024-04-21 Thread via GitHub
kou commented on issue #41312: URL: https://github.com/apache/arrow/issues/41312#issuecomment-2068312242 @AlenkaF Could you take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] docs: add a short page about new drivers [arrow-adbc]

2024-04-21 Thread via GitHub
lidavidm merged PR #1737: URL: https://github.com/apache/arrow-adbc/pull/1737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: