Re: [PR] GH-41361: [C++][Parquet] Optimize DelimitRecords by batch execution [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on code in PR #41362: URL: https://github.com/apache/arrow/pull/41362#discussion_r157758 ## cpp/src/parquet/column_reader.cc: ## @@ -1677,40 +1677,43 @@ class TypedRecordReader : public TypedColumnReaderImpl, int64_t DelimitRecords(int64_t num_records,

Re: [PR] GH-41361: [C++][Parquet] Optimize DelimitRecords by batch execution [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41362: URL: https://github.com/apache/arrow/pull/41362#issuecomment-2074158442 @pitrou @emkornfield @fatemehp @wgtmac -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] GH-41361: [C++][Parquet] Optimize DelimitRecords by batch execution [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41362: URL: https://github.com/apache/arrow/pull/41362#issuecomment-2074167343 On My M1Pro MacOS with Release(-O2): After: ```

Re: [I] Arrow FLight SQL: invalid location in get_flight_info_prepared_statement [arrow-rs]

2024-04-24 Thread via GitHub
Curricane commented on issue #5669: URL: https://github.com/apache/arrow-rs/issues/5669#issuecomment-2074104509 Is there any progress on this matter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Rust Interval definition incorrect [arrow-rs]

2024-04-24 Thread via GitHub
tustvold commented on issue #5654: URL: https://github.com/apache/arrow-rs/issues/5654#issuecomment-2074140467 One way to avoid this leading to subtle downstream breakage might be to do something along the lines of https://github.com/apache/arrow-rs/issues/3125 -- This is an automated

Re: [PR] GH-37938: [Swift] initial impl of C Data interface [arrow]

2024-04-24 Thread via GitHub
abandy commented on code in PR #41342: URL: https://github.com/apache/arrow/pull/41342#discussion_r1577673357 ## swift/Arrow/Package.swift: ## @@ -36,18 +36,27 @@ let package = Package( // and therefore doesn't include the unaligned buffer swift changes. //

Re: [I] [DISCUSS] Reducing cadence of major arrow-rs releases introducing patch releases [arrow-rs]

2024-04-24 Thread via GitHub
aljazerzen commented on issue #5368: URL: https://github.com/apache/arrow-rs/issues/5368#issuecomment-2074666857 Unfortunately, I agree that because of the bump of `pyo3` (and maybe `object_store`, didn't check), `arrow` needs a major version bump as well. This will create a lot of

Re: [PR] GH-41336: [C++][Compute] Fix the bug of decimal types skipping cast in IfElse related expression function calls [arrow]

2024-04-24 Thread via GitHub
ZhangHuiGui commented on code in PR #41363: URL: https://github.com/apache/arrow/pull/41363#discussion_r1577895599 ## cpp/src/arrow/compute/kernels/scalar_if_else.cc: ## @@ -1195,6 +1195,43 @@ struct ResolveIfElseExec { } }; +template +Result

Re: [I] Connecting to postgres (Greenplum) database very slow [arrow-adbc]

2024-04-24 Thread via GitHub
mcrumiller commented on issue #1755: URL: https://github.com/apache/arrow-adbc/issues/1755#issuecomment-2075049611 9s for the first, 261ms for the second. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075086943 Thanks! @rouault Would you mind edit this? Or let me handle this with a new patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075149363 > Would you mind edit this? not sure in which direction: your above proposal " // Check all columns has same row-size" or the alternative proposal I made in

[PR] Support casting from byte array to byte view array. [arrow-rs]

2024-04-24 Thread via GitHub
RinChanNOWWW opened a new pull request, #5686: URL: https://github.com/apache/arrow-rs/pull/5686 # Which issue does this PR close? Part of #5508. # Rationale for this change # What changes are included in this PR? Support casting from byte

Re: [PR] GH-41307: [Java] Use org.apache:apache parent pom version 31 [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41309: URL: https://github.com/apache/arrow/pull/41309#issuecomment-2074749580 @laurentgo I looked into the debug logs, I am not quite sure what is wrong here. But would it be possible to check this per module? I know this change is not that big, but just a

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-24 Thread via GitHub
Kimahriman commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2074811609 > > I'm not sure whats @alamb @tustvold opinion, would it make sense to have your repo in datafusion-contrib @Kimahriman ? > > If so, I would be happy to create a repo in

Re: [PR] MINOR: [Java] Bump org.apache.calcite.avatica:avatica from 1.24.0 to 1.25.0 in /java [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #41212: URL: https://github.com/apache/arrow/pull/41212#issuecomment-2074834762 Revision: bed59960750b32f119d3627839473f6ff5be6bda Submitted crossbow builds: [ursacomputing/crossbow @

Re: [PR] MINOR: [Java] Bump org.cyclonedx:cyclonedx-maven-plugin from 2.7.11 to 2.8.0 in /java [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41210: URL: https://github.com/apache/arrow/pull/41210#issuecomment-2074833495 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] MINOR: [Java] Bump org.cyclonedx:cyclonedx-maven-plugin from 2.7.11 to 2.8.0 in /java [arrow]

2024-04-24 Thread via GitHub
dependabot[bot] commented on PR #41210: URL: https://github.com/apache/arrow/pull/41210#issuecomment-2074833600 Sorry, only users with push access can use that command. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] MINOR: [Java] Bump org.cyclonedx:cyclonedx-maven-plugin from 2.7.11 to 2.8.0 in /java [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41210: URL: https://github.com/apache/arrow/pull/41210#issuecomment-2074835050 @kou we need to rebase this PR, since there were some fixes for CI failures. And then we should run the Java crossbows. Could you please help? -- This is an automated message from

Re: [PR] MINOR: [Java] Bump org.apache.maven.plugins:maven-plugin-plugin from 3.11.0 to 3.12.0 in /java [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41211: URL: https://github.com/apache/arrow/pull/41211#issuecomment-2074832059 @kou could we re-run the `java-jars`, not sure if the failure is related or flaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] MINOR: [Java] Bump org.apache.calcite.avatica:avatica from 1.24.0 to 1.25.0 in /java [arrow]

2024-04-24 Thread via GitHub
lidavidm commented on PR #41212: URL: https://github.com/apache/arrow/pull/41212#issuecomment-2074925949 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] MINOR: [Java] Bump org.apache.maven.plugins:maven-plugin-plugin from 3.11.0 to 3.12.0 in /java [arrow]

2024-04-24 Thread via GitHub
lidavidm commented on PR #41211: URL: https://github.com/apache/arrow/pull/41211#issuecomment-2074926527 Retried -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [C++] [Parquet] Crash / heap-buffer-overflow in TableBatchReader::ReadNext() on a corrupted Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on issue #41317: URL: https://github.com/apache/arrow/issues/41317#issuecomment-2074947638 I found the root-cause is: ``` /Users/mwish/workspace/CMakeLibs/arrow/cpp/src/arrow/table.cc:622: Check failed: _s.ok() Operation failed: table_.ValidateFull() Bad

Re: [PR] GH-41323: [R] Redo how summarize() evaluates expressions [arrow]

2024-04-24 Thread via GitHub
nealrichardson commented on code in PR #41223: URL: https://github.com/apache/arrow/pull/41223#discussion_r1577888964 ## r/NEWS.md: ## @@ -19,6 +19,9 @@ # arrow 16.0.0.9000 +* R functions that users write that use functions that Arrow supports in dataset queries now can

Re: [PR] GH-41307: [Java] Use org.apache:apache parent pom version 31 [arrow]

2024-04-24 Thread via GitHub
laurentgo commented on PR #41309: URL: https://github.com/apache/arrow/pull/41309#issuecomment-2074948712 > It appears Maven is now building things that aren't readable by us for some reason? Which is weird but things are also run inside a docker container with volume mount and the

Re: [I] Connecting to postgres (Greenplum) database very slow [arrow-adbc]

2024-04-24 Thread via GitHub
mcrumiller commented on issue #1755: URL: https://github.com/apache/arrow-adbc/issues/1755#issuecomment-2074991467 Hi @lidavidm, thanks for your help. I'm new to ADBC so apologies that I'm not very well-versed here. I'm in a corporate environment so unfortunately I don't have any admin

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075058403 > Also cc @felipecrv , do you think TableReader would check the input is valid? I think the generate side should checks it, and the consumer would better dcheck that? Unless the

Re: [PR] GH-40339: [Java] StringView Initial Implementation [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #40340: URL: https://github.com/apache/arrow/pull/40340#issuecomment-2074729269 @github-actions crossbow submit *java* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Feature gate arrow-flight client [arrow-rs]

2024-04-24 Thread via GitHub
leoyvens commented on PR #5683: URL: https://github.com/apache/arrow-rs/pull/5683#issuecomment-2074961934 Yes, in tonic, the `transport` feature is required for the "batteries included" server. But arrow-rs does not currently expose a high-level server API that depends on the tonic

Re: [I] [C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression [arrow]

2024-04-24 Thread via GitHub
ZhangHuiGui commented on issue #41094: URL: https://github.com/apache/arrow/issues/41094#issuecomment-2074961032 > If you're interested in adding support for them, that would be greatly appreciated Thanks for your suggestion, we'll take a look. -- This is an automated message from

Re: [PR] Feature gate arrow-flight client [arrow-rs]

2024-04-24 Thread via GitHub
tustvold commented on PR #5683: URL: https://github.com/apache/arrow-rs/pull/5683#issuecomment-2074987273 My understanding of the tonic traits was that they were so low level that whilst in theory one could plugin an alternative this wasn't really practical. I'm still confused why this

[PR] feat(rust): add the driver exporter [arrow-adbc]

2024-04-24 Thread via GitHub
alexandreyc opened a new pull request, #1756: URL: https://github.com/apache/arrow-adbc/pull/1756 Third PR for the Rust implementation containing the driver exporter which allows to automatically create a C-compatible driver from a native Rust driver. CC @mbrobbel -- This is an

Re: [PR] GH-41262: [Java][FlightSQL] Implement stateless prepared statements [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41237: URL: https://github.com/apache/arrow/pull/41237#issuecomment-2074841212 @lidavidm most CIs are failing but the error message is ```bash Error: The operation was canceled. ``` Could we re-run? -- This is an automated message from the

Re: [PR] MINOR: [Java] Bump org.cyclonedx:cyclonedx-maven-plugin from 2.7.11 to 2.8.0 in /java [arrow]

2024-04-24 Thread via GitHub
lidavidm commented on PR #41210: URL: https://github.com/apache/arrow/pull/41210#issuecomment-2074925199 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-41307: [Java] Use org.apache:apache parent pom version 31 [arrow]

2024-04-24 Thread via GitHub
lidavidm commented on PR #41309: URL: https://github.com/apache/arrow/pull/41309#issuecomment-2074922927 I mean the error is right there. ``` ##[debug][Error: EACCES: permission denied, open '/home/runner/work/arrow/arrow/java/adapter/avro/target/arrow-avro-17.0.0-SNAPSHOT.jar']

Re: [I] Connecting to postgres (Greenplum) database very slow [arrow-adbc]

2024-04-24 Thread via GitHub
mcrumiller commented on issue #1755: URL: https://github.com/apache/arrow-adbc/issues/1755#issuecomment-2075082956 First query: 6,667,495 rows in 97.72s. Second query: 141,851 rows in 11.59s. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075081456 > Just curious should we add DCHECK in TableReader helps debugging here Add a `DCHECK` in `TableBatchReader`, document the precondition (with `/// \pre ...` in the constructor,

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
zeroshade commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075165132 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Arrow Flight SQL example JDBC driver incompatibility [arrow-rs]

2024-04-24 Thread via GitHub
tustvold merged PR #5666: URL: https://github.com/apache/arrow-rs/pull/5666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Arrow FLight SQL: invalid location in get_flight_info_prepared_statement [arrow-rs]

2024-04-24 Thread via GitHub
tustvold closed issue #5669: Arrow FLight SQL: invalid location in get_flight_info_prepared_statement URL: https://github.com/apache/arrow-rs/issues/5669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Arrow Flight SQL example server: do_handshake should include auth header [arrow-rs]

2024-04-24 Thread via GitHub
tustvold closed issue #5665: Arrow Flight SQL example server: do_handshake should include auth header URL: https://github.com/apache/arrow-rs/issues/5665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix(csharp/src/Apache.Arrow.Adbc): Add support to the C Exporter for converting exceptions into AdbcErrors [arrow-adbc]

2024-04-24 Thread via GitHub
CurtHagenlocher commented on code in PR #1752: URL: https://github.com/apache/arrow-adbc/pull/1752#discussion_r156163 ## csharp/src/Apache.Arrow.Adbc/C/CAdbcDriverExporter.cs: ## @@ -708,13 +754,8 @@ public void Dispose() connection = null; }

Re: [PR] feat(glib): add GADBCArrowConnection [arrow-adbc]

2024-04-24 Thread via GitHub
lidavidm merged PR #1754: URL: https://github.com/apache/arrow-adbc/pull/1754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075064489 > Is the Parquet reader producing an invalid Table instance and passing it to TableReader? Yes. User is running fuzzing on parquet file, when parsing a corrupt parquet file, we

Re: [I] [CI][Python] Enable ccache for PyArrow Builds [arrow]

2024-04-24 Thread via GitHub
llama90 commented on issue #41316: URL: https://github.com/apache/arrow/issues/41316#issuecomment-2074628780 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-37938: [Swift] initial impl of C Data interface [arrow]

2024-04-24 Thread via GitHub
abandy commented on code in PR #41342: URL: https://github.com/apache/arrow/pull/41342#discussion_r1577679471 ## swift/Arrow/Sources/Arrow/ArrowCExporter.swift: ## @@ -0,0 +1,134 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075003778 Also cc @felipecrv , do you think TableReader would check the input is valid? I think the generate side should checks it, and the consumer would better dcheck that? -- This is an

Re: [I] [C++] Move LocalFileSystem to a separate module [arrow]

2024-04-24 Thread via GitHub
bkietz commented on issue #40342: URL: https://github.com/apache/arrow/issues/40342#issuecomment-2075084308 Issue resolved by pull request 40356 https://github.com/apache/arrow/pull/40356 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-40342: [C++] move LocalFileSystem to the registry [arrow]

2024-04-24 Thread via GitHub
bkietz merged PR #40356: URL: https://github.com/apache/arrow/pull/40356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-40339: [Java] StringView Initial Implementation [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #40340: URL: https://github.com/apache/arrow/pull/40340#issuecomment-2074766904 @lidavidm I addressed the reviews. And also I added a few additional test cases for views. And earlier there were 4 tests basically covering the view-based logics in storing data.

Re: [PR] GH-40339: [Java] StringView Initial Implementation [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #40340: URL: https://github.com/apache/arrow/pull/40340#issuecomment-2074806399 `Dev / Source Release and Merge Script on macos-latest ` CI is keep failing. I think there is an issue in installing `Ruby` cc @kou -- This is an automated message from the Apache

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2074951956 See https://github.com/apache/arrow/issues/41317#issuecomment-2074947638 The root cause is that we didn't check the length. This can be detect by add the

Re: [PR] GH-40636: [C++][Parquet] Improve fallback encoding choice in column writer. [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on code in PR #40957: URL: https://github.com/apache/arrow/pull/40957#discussion_r1577957925 ## cpp/src/parquet/encoding.cc: ## @@ -3988,4 +3988,56 @@ std::unique_ptr MakeDictDecoder(Type::type type_num, } } // namespace detail + // + // +bool

Re: [PR] feat(rust): add the driver exporter [arrow-adbc]

2024-04-24 Thread via GitHub
mbrobbel commented on PR #1756: URL: https://github.com/apache/arrow-adbc/pull/1756#issuecomment-2075156735 > I'm not sure why the CI is broken... Maybe a cache invalidation issue? > > CC @mbrobbel Looks like https://github.com/actions/runner-images/issues/9732. -- This is

[I] feat object_store: moving tests from src/ to a tests/ folder and enabling access to test functions for enabling a shared integration test suite [arrow-rs]

2024-04-24 Thread via GitHub
Silemo opened a new issue, #5685: URL: https://github.com/apache/arrow-rs/issues/5685 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Hi young dev here! I am currently busy doing an external implementation (or to

Re: [PR] GH-40339: [Java] StringView Initial Implementation [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #40340: URL: https://github.com/apache/arrow/pull/40340#issuecomment-2074733574 Revision: e7cb2d3d240b9b634897f38ff1f3d12772b7031f Submitted crossbow builds: [ursacomputing/crossbow @

Re: [PR] fix(csharp/src/Apache.Arrow.Adbc): Add support to the C Exporter for converting exceptions into AdbcErrors [arrow-adbc]

2024-04-24 Thread via GitHub
CurtHagenlocher commented on PR #1752: URL: https://github.com/apache/arrow-adbc/pull/1752#issuecomment-2074799724 This is for external code calling into a C#-implemented driver through the C API so it doesn't impact Snowflake. -- This is an automated message from the Apache Git Service.

Re: [PR] feat(rust): add complete FFI bindings [arrow-adbc]

2024-04-24 Thread via GitHub
lidavidm merged PR #1742: URL: https://github.com/apache/arrow-adbc/pull/1742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41262: [Java][FlightSQL] Implement stateless prepared statements [arrow]

2024-04-24 Thread via GitHub
lidavidm commented on PR #41237: URL: https://github.com/apache/arrow/pull/41237#issuecomment-2074933102 We have already re-run. The tests themselves appear to get stuck. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-37938: [Swift] initial impl of C Data interface [arrow]

2024-04-24 Thread via GitHub
abandy commented on code in PR #41342: URL: https://github.com/apache/arrow/pull/41342#discussion_r1577677057 ## swift/Arrow/Sources/Arrow/ArrowCExporter.swift: ## @@ -0,0 +1,134 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] MINOR: [Java] Bump org.apache.calcite.avatica:avatica from 1.24.0 to 1.25.0 in /java [arrow]

2024-04-24 Thread via GitHub
vibhatha commented on PR #41212: URL: https://github.com/apache/arrow/pull/41212#issuecomment-2074830020 @github-actions crossbow submit -g java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [C++] ReadNext in arrow::RecordBatchReader returns invalid status on second or subsequent items [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on issue #41339: URL: https://github.com/apache/arrow/issues/41339#issuecomment-2074922591 Aha, I use master code and run `ReadInBatches` in `cpp/examples/arrow/parquet_read_write`'s `ReadInBatches`: ```c++ arrow::Status ReadInBatches(std::string path_to_file) {

Re: [PR] GH-41323: [R] Redo how summarize() evaluates expressions [arrow]

2024-04-24 Thread via GitHub
thisisnic commented on code in PR #41223: URL: https://github.com/apache/arrow/pull/41223#discussion_r1577938152 ## r/NEWS.md: ## @@ -19,6 +19,9 @@ # arrow 16.0.0.9000 +* R functions that users write that use functions that Arrow supports in dataset queries now can be

Re: [I] [C++][Compute] Add quotient and modulo kernels [arrow]

2024-04-24 Thread via GitHub
randolf-scholz commented on issue #28497: URL: https://github.com/apache/arrow/issues/28497#issuecomment-2075048359 A modulo kernel would also be nice for `duration` types (which are just integers under the hood). For instance, in time series a common task is autodetect the frequency at

Re: [PR] feat(rust): add the driver exporter [arrow-adbc]

2024-04-24 Thread via GitHub
alexandreyc commented on PR #1756: URL: https://github.com/apache/arrow-adbc/pull/1756#issuecomment-2075131699 I'm not sure why the CI is broken... Maybe a cache invalidation issue? CC @mbrobbel -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075170624 Revision: 32c530e692d239fb659058dc0eacf44c225d65f5 Submitted crossbow builds: [ursacomputing/crossbow @

[I] Release arrow-rs / parquet version `52.0.0` [arrow-rs]

2024-04-24 Thread via GitHub
alamb opened a new issue, #5688: URL: https://github.com/apache/arrow-rs/issues/5688 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** 50.1.0 https://github.com/apache/arrow-rs/issues/5453 was released about 2 months ago:

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on code in PR #41335: URL: https://github.com/apache/arrow/pull/41335#discussion_r1578119685 ## cpp/src/arrow/acero/query_context.cc: ## @@ -23,6 +23,36 @@ namespace arrow { using arrow::internal::CpuInfo; namespace acero { +namespace internal { +

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075282535 Will merge in 2days if no negative comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on code in PR #41335: URL: https://github.com/apache/arrow/pull/41335#discussion_r1578128252 ## cpp/src/arrow/acero/query_context.cc: ## @@ -23,6 +23,36 @@ namespace arrow { using arrow::internal::CpuInfo; namespace acero { +namespace internal { +

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
zeroshade commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075414280 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
zeroshade commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075523237 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on code in PR #41335: URL: https://github.com/apache/arrow/pull/41335#discussion_r1578123814 ## cpp/src/arrow/acero/query_context.cc: ## @@ -23,6 +23,36 @@ namespace arrow { using arrow::internal::CpuInfo; namespace acero { +namespace internal { +

Re: [I] [C++] Move LocalFileSystem to a separate module [arrow]

2024-04-24 Thread via GitHub
raulcd commented on issue #40342: URL: https://github.com/apache/arrow/issues/40342#issuecomment-2075346442 Shouldn't this be 17.0.0? For some of our downstream packages we might require some changes (like conda) -- This is an automated message from the Apache Git Service. To respond to

Re: [I] [C++][Acero] Support join matching missing value [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on issue #41358: URL: https://github.com/apache/arrow/issues/41358#issuecomment-2075509417 Acero does support this kind of join, which is enabled by `JoinKeyCmp::IS`:

Re: [I] [C++][Acero] Support join matching missing value [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on issue #41358: URL: https://github.com/apache/arrow/issues/41358#issuecomment-2075508642 Acero does support this kind of join, which is enabled by `JoinKeyCmp::IS`:

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-24 Thread via GitHub
Kimahriman commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2075677843 I'll probably just keep the hdfs-native-object-store crate, so maybe just use that same name for the repo? don't think `datafusion` really needs to be in the project name since

Re: [I] [R] Error: package or namespace load failed for ‘arrow’ in inDL(x, as.logical(local), as.logical(now), ...): [arrow]

2024-04-24 Thread via GitHub
Aariq commented on issue #32558: URL: https://github.com/apache/arrow/issues/32558#issuecomment-2075713199 I suspect this is maybe still an issue? I just had a learner in a workshop encounter this error on Windows: ``` > library(arrow) Error: package or namespace load failed

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075749184 Revision: c106aae233ae504dc5bcf7124a0d418568086fb7 Submitted crossbow builds: [ursacomputing/crossbow @

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
zeroshade commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075745836 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075193327 > not sure in which direction: your above proposal " // Check all columns has same row-size" or the alternative proposal I made in

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075270692 > I mean we can check it here: [#41320 (review)](https://github.com/apache/arrow/pull/41320#pullrequestreview-2019926584) . ok, closing that PR, and opening

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-04-24 Thread via GitHub
ianmcook commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2075292973 > I'd be curious what others think of this approach as opposed to actually making a format change to include statistics alongside the record batches in the API I think the

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-24 Thread via GitHub
alamb commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2075755439 Created https://github.com/datafusion-contrib/hdfs-native-object-store and invited you as an admit -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075828859 > is expected to be valid prior to using it with the batch reader done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-04-24 Thread via GitHub
zeroshade commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2075167964 I'd be curious what others think of this approach as opposed to actually making a format change to include statistics alongside the record batches in the API. Particular in the

Re: [PR] ci: Add pipeline support to bundle Go binaries in NuGet packages [arrow-adbc]

2024-04-24 Thread via GitHub
davidhcoe commented on PR #1730: URL: https://github.com/apache/arrow-adbc/pull/1730#issuecomment-2075217242 This is ready now, although I am not sure why the `Packaging / Python amd64 macOS (pull_request) ` check is failing. It doesn't _appear_ related to my changes. -- This is an

[PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault opened a new pull request, #41366: URL: https://github.com/apache/arrow/pull/41366 ### Rationale for this change Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a corrupted Parquet file ### What changes are included in this PR? Add a

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault closed pull request #41320: GH-41317: [C++] Fix crash on invalid Parquet file URL: https://github.com/apache/arrow/pull/41320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] GH-41367: [C++] Replace [[maybe_unused]] with Arrow macro [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #41359: URL: https://github.com/apache/arrow/pull/41359#issuecomment-2075364286 :warning: GitHub issue #41367 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH-40636: [C++][Parquet] Improve fallback encoding choice in column writer. [arrow]

2024-04-24 Thread via GitHub
ClifHouck commented on code in PR #40957: URL: https://github.com/apache/arrow/pull/40957#discussion_r1578252008 ## cpp/src/parquet/encoding.cc: ## @@ -3988,4 +3988,56 @@ std::unique_ptr MakeDictDecoder(Type::type type_num, } } // namespace detail + // + // +bool

Re: [PR] GH-37929: [Python] begin moving static settings to pyproject.toml [arrow]

2024-04-24 Thread via GitHub
anjakefala commented on PR #41041: URL: https://github.com/apache/arrow/pull/41041#issuecomment-2075463670 `setuptools_scm` does have logging. I'm going to see if the logging helps reveal anything. If nothing, I'll explore the alternatives! -- This is an automated message from the Apache

Re: [I] feat: 1.0.0 libraries release (tracking issue) [arrow-adbc]

2024-04-24 Thread via GitHub
CurtHagenlocher commented on issue #1490: URL: https://github.com/apache/arrow-adbc/issues/1490#issuecomment-207540 It would be good to iron out the exact scope of what a 1.0.0 would include. I'd like (for instance) to make some breaking changes to some C# APIs to clean them up a bit

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-04-24 Thread via GitHub
zanmato1984 commented on code in PR #41335: URL: https://github.com/apache/arrow/pull/41335#discussion_r1578128252 ## cpp/src/arrow/acero/query_context.cc: ## @@ -23,6 +23,36 @@ namespace arrow { using arrow::internal::CpuInfo; namespace acero { +namespace internal { +

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

2024-04-24 Thread via GitHub
jbonofre commented on PR #41187: URL: https://github.com/apache/arrow/pull/41187#issuecomment-2075391064 Sure thing ! I'm traveling this week but I should find time ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] feat: adding HDFS support in the object_store crate [arrow-rs]

2024-04-24 Thread via GitHub
alamb commented on issue #5638: URL: https://github.com/apache/arrow-rs/issues/5638#issuecomment-2075535831 https://github.com/datafusion-contrib/datafusion-objectstore-hdfs already exists Would you like to make one like

Re: [PR] feat(go/adbc/driver/bigquery): add support for Google BigQuery [arrow-adbc]

2024-04-24 Thread via GitHub
zeroshade commented on code in PR #1722: URL: https://github.com/apache/arrow-adbc/pull/1722#discussion_r1578500656 ## go/adbc/driver/bigquery/connection.go: ## @@ -0,0 +1,868 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41367: [C++] Replace [[maybe_unused]] with Arrow macro [arrow]

2024-04-24 Thread via GitHub
kou commented on PR #41359: URL: https://github.com/apache/arrow/pull/41359#issuecomment-2075824614 The "AMD64 macOS 12 GLib & Ruby" failure: #41369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Feature gate arrow-flight client [arrow-rs]

2024-04-24 Thread via GitHub
leoyvens commented on PR #5683: URL: https://github.com/apache/arrow-rs/pull/5683#issuecomment-2075300691 For arrow-flight it is truly only required for the high-level client, not for the server since `FlightServiceServer` is just a `Service` trait implementation. But I'm happy to rename

Re: [PR] GH-41186: [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41187: URL: https://github.com/apache/arrow/pull/41187#issuecomment-2075397876 Oh, sorry for bothering, enjoy your traveling! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-41179: [Docs] Documentation for Dissociated IPC Protocol [arrow]

2024-04-24 Thread via GitHub
github-actions[bot] commented on PR #41180: URL: https://github.com/apache/arrow/pull/41180#issuecomment-2075417969 Revision: f5f644c596a599f318fcc9060d51e7cdcfc1d1ca Submitted crossbow builds: [ursacomputing/crossbow @

  1   2   3   >