[GitHub] [arrow-adbc] dependabot[bot] opened a new pull request, #54: Bump postgresql from 42.4.0 to 42.4.1 in /java/driver/jdbc-validation-postgresql
dependabot[bot] opened a new pull request, #54: URL: https://github.com/apache/arrow-adbc/pull/54 Bumps [postgresql](https://github.com/pgjdbc/pgjdbc) from 42.4.0 to 42.4.1. Changelog Sourced from https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md";>postgresql's changelog. Changelog Notable changes since version 42.0.0, read the complete https://jdbc.postgresql.org/documentation/changelog.html";>History of Changes. The format is based on http://keepachangelog.com/en/1.0.0/";>Keep a Changelog. [Unreleased] Changed Added Fixed [42.4.1] (2022-08-01 16:24:20 -0400) Security fix: CVE-2022-31197 Fixes SQL generated in PgResultSet.refresh() to escape column identifiers so as to prevent SQL injection. Previously, the column names for both key and data columns in the table were copied as-is into the generated SQL. This allowed a malicious table with column names that include statement terminator to be parsed and executed as multiple separate commands. Also adds a new test class ResultSetRefreshTest to verify this change. Reported by https://github.com/kato-sho";>Sho Kato Changed chore: skip publishing pgjdbc-osgi-test to Central chore: bump Gradle to 7.5 test: update JUnit to 5.8.2 Added chore: added Gradle Wrapper Validation for verifying gradle-wrapper.jar chore: added "permissions: contents: read" for GitHub Actions to avoid unintentional modifications by the CI chore: support building pgjdbc with Java 17 Fixed Commits https://github.com/pgjdbc/pgjdbc/commit/bd91c4cc76cdfc1ffd0322be80c85ddfe08a38c2";>bd91c4c Prepare for release (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2580";>#2580) https://github.com/pgjdbc/pgjdbc/commit/739e599d52ad80f8dcd6efedc6157859b1a9d637";>739e599 Merge pull request from GHSA-r38f-c4h4-hqq2 https://github.com/pgjdbc/pgjdbc/commit/736f9598c5b32a19c645ad33f118d2c9c266e90e";>736f959 fix: replace syncronization in Connection.close with compareAndSet https://github.com/pgjdbc/pgjdbc/commit/4673fd271c63a24b2a363149945187bad911888a";>4673fd2 feat: synchronize statement executions (e.g. avoid deadlock when Connection.i... https://github.com/pgjdbc/pgjdbc/commit/fd31a06f9c64a2ad69ce274de99ec31d0e1c3b6d";>fd31a06 update the website content (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2578";>#2578) https://github.com/pgjdbc/pgjdbc/commit/a6044d05b80e1bda2fbe2f4e6bd0a714b8e74030";>a6044d0 set a timeout to get the return from requesting SSL upgrade. (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2572";>#2572) https://github.com/pgjdbc/pgjdbc/commit/58d6fa085fef483d5f972146c9e7e8f805d144d9";>58d6fa0 test: bump system-stubs-jupiter to 2.0.1 to support Java 16+ https://github.com/pgjdbc/pgjdbc/commit/b452d8c6d16ffdcd79495e5857ce9ba37bd8a87b";>b452d8c test: avoid concurrent executions of tests that update environment and system... https://github.com/pgjdbc/pgjdbc/commit/aa5758a18893ced9c1b20655be6042444d746440";>aa5758a test: update JUnit to 5.8.2 https://github.com/pgjdbc/pgjdbc/commit/36cd24c300118c36a8b408665118a1f83b82751d";>36cd24c fix: log connection URL when it can't be parsed Additional commits viewable in https://github.com/pgjdbc/pgjdbc/compare/REL42.4.0...REL42.4.1";>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.postgresql:postgresql&package-manager=maven&previous-version=42.4.0&new-version=42.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close thi
[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #14: Owning/mutable `struct ArrowArray`
lidavidm commented on code in PR #14: URL: https://github.com/apache/arrow-nanoarrow/pull/14#discussion_r939166568 ## src/nanoarrow/typedefs_inline.h: ## @@ -165,6 +212,20 @@ struct ArrowBitmap { int64_t size_bits; }; +/// \brief A structure used as the private data member for ArrowArrays allocated here Review Comment: nit: does this need to be in the public header? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #12: Add metadata builder functions
lidavidm commented on code in PR #12: URL: https://github.com/apache/arrow-nanoarrow/pull/12#discussion_r939160612 ## src/nanoarrow/nanoarrow.h: ## @@ -261,6 +261,24 @@ ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key, const char* default_value, struct ArrowStringView* value_out); +/// \brief Initialize a builder for schema metadata from key/value pairs +ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer, const char* metadata); Review Comment: The `metadata` param is an existing metadata buffer? (It's also not tested) ## src/nanoarrow/metadata.c: ## @@ -114,8 +114,156 @@ ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key, return NANOARROW_OK; } +ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key, + const char* default_value, + struct ArrowStringView* value_out) { + struct ArrowStringView key_view = {key, strlen(key)}; + return ArrowMetadataGetValueView(metadata, &key_view, default_value, value_out); +} + char ArrowMetadataHasKey(const char* metadata, const char* key) { struct ArrowStringView value; ArrowMetadataGetValue(metadata, key, NULL, &value); return value.data != NULL; } + +ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer, +const char* metadata) { + ArrowBufferInit(buffer); + int result = ArrowBufferAppend(buffer, metadata, ArrowMetadataSizeOf(metadata)); + if (result != NANOARROW_OK) { +return result; + } + + return NANOARROW_OK; +} + +ArrowErrorCode ArrowMetadataBuilderAppendView(struct ArrowBuffer* buffer, + struct ArrowStringView* key, + struct ArrowStringView* value) { + if (value == NULL) { +return NANOARROW_OK; + } Review Comment: Hmm, to me it's a little weird to accept NULL as the value and then just do nothing with it. If we just considered `append(key, NULL)` to be an error, we could drop this, and then we could pass the views by value instead of indirecting through a pointer ## src/nanoarrow/metadata.c: ## @@ -114,8 +114,156 @@ ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key, return NANOARROW_OK; } +ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key, + const char* default_value, + struct ArrowStringView* value_out) { + struct ArrowStringView key_view = {key, strlen(key)}; + return ArrowMetadataGetValueView(metadata, &key_view, default_value, value_out); +} + char ArrowMetadataHasKey(const char* metadata, const char* key) { struct ArrowStringView value; ArrowMetadataGetValue(metadata, key, NULL, &value); return value.data != NULL; } + +ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer, +const char* metadata) { + ArrowBufferInit(buffer); + int result = ArrowBufferAppend(buffer, metadata, ArrowMetadataSizeOf(metadata)); + if (result != NANOARROW_OK) { +return result; + } + + return NANOARROW_OK; +} + +ArrowErrorCode ArrowMetadataBuilderAppendView(struct ArrowBuffer* buffer, Review Comment: Worth possibly exposing this variant too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17328) [C++] Add hash_mode function
Ian Cook created ARROW-17328: Summary: [C++] Add hash_mode function Key: ARROW-17328 URL: https://issues.apache.org/jira/browse/ARROW-17328 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Ian Cook Arrow currently has a {{mode}} kernel but no {{hash_mode}} kernel. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17327) Parquet should be listed in PyArrow's get_libraries() function
Steven Silvester created ARROW-17327: Summary: Parquet should be listed in PyArrow's get_libraries() function Key: ARROW-17327 URL: https://issues.apache.org/jira/browse/ARROW-17327 Project: Apache Arrow Issue Type: Bug Reporter: Steven Silvester We are updating {{PyMongoArrow}} to use PyArrow 8.0, and saw the following [failure| https://github.com/mongodb-labs/mongo-arrow/runs/7696619223?check_suite_focus=true] when building wheels: "@rpath/libparquet.800.dylib not found". We overcame the error by explicitly adding "parquet" to the list of libraries returned by {{get_libraries}}. I am happy to submit a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-nanoarrow] paleolimbot opened a new pull request, #14: Owning/mutable `struct ArrowArray`
paleolimbot opened a new pull request, #14: URL: https://github.com/apache/arrow-nanoarrow/pull/14 Fixes #5 by implementing an Array whose buffer lifecycle is handled by `struct ArrowBuffer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] codecov-commenter commented on pull request #12: Add metadata builder functions
codecov-commenter commented on PR #12: URL: https://github.com/apache/arrow-nanoarrow/pull/12#issuecomment-1206757501 # [Codecov](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#12](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (66073ee) into [main](https://codecov.io/gh/apache/arrow-nanoarrow/commit/51e5052ddd08fb424d8c20c86f9d5ea7d7b4ff51?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (51e5052) will **decrease** coverage by `1.77%`. > The diff coverage is `75.94%`. ```diff @@Coverage Diff @@ ## main #12 +/- ## == - Coverage 91.97% 90.20% -1.78% == Files 56 +1 Lines 798 919 +121 Branches 30 38 +8 == + Hits 734 829 +95 - Misses 41 59 +18 - Partials 23 31 +8 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [src/nanoarrow/metadata.c](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9tZXRhZGF0YS5j) | `85.03% <75.94%> (-14.97%)` | :arrow_down: | | [src/nanoarrow/buffer\_inline.h](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9idWZmZXJfaW5saW5lLmg=) | `84.78% <0.00%> (ø)` | | :mega: Codecov can now indicate which changes are the most critical in Pull Requests. [Learn more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #11: Inline performance-sensitive functions and their dependencies
paleolimbot commented on issue #11: URL: https://github.com/apache/arrow-nanoarrow/issues/11#issuecomment-1206753929 Fixed in #10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot closed issue #11: Inline performance-sensitive functions and their dependencies
paleolimbot closed issue #11: Inline performance-sensitive functions and their dependencies URL: https://github.com/apache/arrow-nanoarrow/issues/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot merged PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot closed issue #4: Implement bitmap helpers
paleolimbot closed issue #4: Implement bitmap helpers URL: https://github.com/apache/arrow-nanoarrow/issues/4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
lidavidm commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939046140 ## src/nanoarrow/buffer_inline.h: ## @@ -15,14 +15,20 @@ // specific language governing permissions and limitations // under the License. +#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED +#define NANOARROW_BUFFER_INLINE_H_INCLUDED + #include -#include -#include +#include #include -#include "nanoarrow.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif -static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { +static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { Review Comment: Ah, interesting. I agree it's probably safe. It wouldn't be an issue if we have to change it later for some reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939045503 ## src/nanoarrow/buffer_inline.h: ## @@ -15,14 +15,20 @@ // specific language governing permissions and limitations // under the License. +#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED +#define NANOARROW_BUFFER_INLINE_H_INCLUDED + #include -#include -#include +#include #include -#include "nanoarrow.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif -static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { +static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { Review Comment: Maybe `ArrowPrivateXXX`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939043459 ## src/nanoarrow/buffer_inline.h: ## @@ -15,14 +15,20 @@ // specific language governing permissions and limitations // under the License. +#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED +#define NANOARROW_BUFFER_INLINE_H_INCLUDED + #include -#include -#include +#include #include -#include "nanoarrow.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif -static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { +static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { Review Comment: I see...I was copying the pattern used by headers generated by nanopb ("private" inline functions). Is there a better pattern for functions that have to be visible for inline functions but that shouldn't be accessed otherwise? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
pitrou commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939041016 ## src/nanoarrow/buffer_inline.h: ## @@ -15,14 +15,20 @@ // specific language governing permissions and limitations // under the License. +#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED +#define NANOARROW_BUFFER_INLINE_H_INCLUDED + #include -#include -#include +#include #include -#include "nanoarrow.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif -static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { +static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { Review Comment: It's used in many C projects though, so most probably can be considered safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17326) [Go][FlightSQL] Add Support for FlightSQL to Go
Matthew Topol created ARROW-17326: - Summary: [Go][FlightSQL] Add Support for FlightSQL to Go Key: ARROW-17326 URL: https://issues.apache.org/jira/browse/ARROW-17326 Project: Apache Arrow Issue Type: New Feature Reporter: Matthew Topol Assignee: Matthew Topol Also addresses https://github.com/apache/arrow/issues/12496 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17325) AQE should use available column statistics from completed query stages
Andy Grove created ARROW-17325: -- Summary: AQE should use available column statistics from completed query stages Key: ARROW-17325 URL: https://issues.apache.org/jira/browse/ARROW-17325 Project: Apache Arrow Issue Type: Improvement Components: SQL Reporter: Andy Grove In QueryStageExec.computeStats we copy partial statistics from materlized query stages by calling QueryStageExec#getRuntimeStatistics, which in turn calls ShuffleExchangeLike#runtimeStatistics or BroadcastExchangeLike#runtimeStatistics. Only dataSize and numOutputRows are copied into the new Statistics object: {code:scala} def computeStats(): Option[Statistics] = if (isMaterialized) { val runtimeStats = getRuntimeStatistics val dataSize = runtimeStats.sizeInBytes.max(0) val numOutputRows = runtimeStats.rowCount.map(_.max(0)) Some(Statistics(dataSize, numOutputRows, isRuntime = true)) } else { None } {code} I would like to also copy over the column statistics stored in Statistics.attributeMap so that they can be fed back into the logical plan optimization phase. The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do not currently provide such column statistics but other custom implementations can. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
lidavidm commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939027329 ## src/nanoarrow/bitmap_inline.h: ## @@ -0,0 +1,323 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED +#define NANOARROW_BITMAP_INLINE_H_INCLUDED + +#include +#include + +#include "buffer_inline.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif + +static const uint8_t _ArrowkBitmask[] = {1, 2, 4, 8, 16, 32, 64, 128}; +static const uint8_t _ArrowkFlippedBitmask[] = {254, 253, 251, 247, 239, 223, 191, 127}; +static const uint8_t _ArrowkPrecedingBitmask[] = {0, 1, 3, 7, 15, 31, 63, 127}; +static const uint8_t _ArrowkTrailingBitmask[] = {255, 254, 252, 248, 240, 224, 192, 128}; Review Comment: Ditto the comment about underscores in names here (unfortunately). ## src/nanoarrow/buffer_inline.h: ## @@ -15,14 +15,20 @@ // specific language governing permissions and limitations // under the License. +#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED +#define NANOARROW_BUFFER_INLINE_H_INCLUDED + #include -#include -#include +#include #include -#include "nanoarrow.h" +#include "typedefs_inline.h" + +#ifdef __cplusplus +extern "C" { +#endif -static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { +static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t new_capacity) { Review Comment: It's not allowed to start names with an underscore: https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17324) [Go][CI] Add new Go CI job with -asan
Matthew Topol created ARROW-17324: - Summary: [Go][CI] Add new Go CI job with -asan Key: ARROW-17324 URL: https://issues.apache.org/jira/browse/ARROW-17324 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, Go Reporter: Matthew Topol go1.18 added a "-asan" build option to leverage an equivalent to Address Sanitizer in C++. Currently we only build the Go code and run tests using go1.16 which does not have the "-asan" option. Since we want to maintain the backwards compatibility and not yet upgrade to go1.18, we should create a new job that runs the tests using go1.18 and the "-asan" option to perform additional safety checking. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17323) [Go] Clean up and upgrade dependencies
Matthew Topol created ARROW-17323: - Summary: [Go] Clean up and upgrade dependencies Key: ARROW-17323 URL: https://issues.apache.org/jira/browse/ARROW-17323 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Matthew Topol Assignee: Matthew Topol Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17322) [Docs] Add issue handling guidance to docs
Todd Farmer created ARROW-17322: --- Summary: [Docs] Add issue handling guidance to docs Key: ARROW-17322 URL: https://issues.apache.org/jira/browse/ARROW-17322 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Todd Farmer Per [this mailing list discussion|https://lists.apache.org/thread/6crmd1qp093gk1s3l2sjdy88qoqym409], it is proposed that the following policies be adopted and documented relative to issue handling: * Issues should be assigned only when they are being actively worked, or expected to be worked in the immediate future. Assigned issues that have not been updated in past 90 days should be reverted to unassigned. * All issues "In Progress" required an assignee. Any unassigned issue in "In Progress" status should be reverted to "Open" status. * Expected usage of issue status and resolution fields should be documented. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17321) Update dependencies
Dominik Moritz created ARROW-17321: -- Summary: Update dependencies Key: ARROW-17321 URL: https://issues.apache.org/jira/browse/ARROW-17321 Project: Apache Arrow Issue Type: Task Components: JavaScript Affects Versions: 9.0.0 Reporter: Dominik Moritz Assignee: Dominik Moritz Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #13: Add coverage badge back (and nudge CI to upload a report so we get PR coverage diffs)
paleolimbot merged PR #13: URL: https://github.com/apache/arrow-nanoarrow/pull/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #8: Implement element-wise appenders for `struct ArrowArray`s that we allocated
paleolimbot commented on issue #8: URL: https://github.com/apache/arrow-nanoarrow/issues/8#issuecomment-1206564623 That's an excellent point, and David's "bag of buffers" comment makes a lot of sense. Type-specific appenders are definitely the way to go and I think after #5 we'll have what it takes to make the "accumulate a record batch from a schema defined at runtime" workflow a thing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1206553114 Ok, I think I have this with syntax and feature parity with the `struct ArrowBuffer` (in preparation for defining an owning `struct ArrowArray` that is a `struct ArrowBitmap` + a 3 `struct ArrowBuffer`s). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1206514130 I see...I'd been using it to simplify the append process, but the right thing to do is to properly bitpack-as-you-append (which is now implemented) so that the `ArrowBufferXXX()` functions get called in the same order as the `ArrowBitmapXXX()` functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938856922 ## src/nanoarrow/nanoarrow.h: ## @@ -483,82 +372,117 @@ ArrowErrorCode ArrowSchemaViewInit(struct ArrowSchemaView* schema_view, /// }@ -/// \defgroup nanoarrow-buffer-builder Growable buffer builders - -/// \brief An owning mutable view of a buffer -struct ArrowBuffer { - /// \brief A pointer to the start of the buffer - /// - /// If capacity_bytes is 0, this value may be NULL. - uint8_t* data; - - /// \brief The size of the buffer in bytes - int64_t size_bytes; - - /// \brief The capacity of the buffer in bytes - int64_t capacity_bytes; - - /// \brief The allocator that will be used to reallocate and/or free the buffer - struct ArrowBufferAllocator* allocator; -}; +/// \defgroup nanoarrow-buffer Owning, growable buffers /// \brief Initialize an ArrowBuffer /// /// Initialize a buffer with a NULL, zero-size buffer using the default /// buffer allocator. -void ArrowBufferInit(struct ArrowBuffer* buffer); +static inline void ArrowBufferInit(struct ArrowBuffer* buffer); /// \brief Set a newly-initialized buffer's allocator /// /// Returns EINVAL if the buffer has already been allocated. -ArrowErrorCode ArrowBufferSetAllocator(struct ArrowBuffer* buffer, - struct ArrowBufferAllocator* allocator); +static inline ArrowErrorCode ArrowBufferSetAllocator( +struct ArrowBuffer* buffer, struct ArrowBufferAllocator* allocator); /// \brief Reset an ArrowBuffer /// /// Releases the buffer using the allocator's free method if /// the buffer's data member is non-null, sets the data member /// to NULL, and sets the buffer's size and capacity to 0. -void ArrowBufferReset(struct ArrowBuffer* buffer); +static inline void ArrowBufferReset(struct ArrowBuffer* buffer); /// \brief Move an ArrowBuffer /// /// Transfers the buffer data and lifecycle management to another /// address and resets buffer. -void ArrowBufferMove(struct ArrowBuffer* buffer, struct ArrowBuffer* buffer_out); +static inline void ArrowBufferMove(struct ArrowBuffer* buffer, + struct ArrowBuffer* buffer_out); /// \brief Grow or shrink a buffer to a given capacity /// /// When shrinking the capacity of the buffer, the buffer is only reallocated /// if shrink_to_fit is non-zero. Calling ArrowBufferResize() does not /// adjust the buffer's size member except to ensure that the invariant /// capacity >= size remains true. -ArrowErrorCode ArrowBufferResize(struct ArrowBuffer* buffer, int64_t new_capacity_bytes, - char shrink_to_fit); +static inline ArrowErrorCode ArrowBufferResize(struct ArrowBuffer* buffer, + int64_t new_capacity_bytes, + char shrink_to_fit); /// \brief Ensure a buffer has at least a given additional capacity /// /// Ensures that the buffer has space to append at least /// additional_size_bytes, overallocating when required. -ArrowErrorCode ArrowBufferReserve(struct ArrowBuffer* buffer, - int64_t additional_size_bytes); +static inline ArrowErrorCode ArrowBufferReserve(struct ArrowBuffer* buffer, +int64_t additional_size_bytes); /// \brief Write data to buffer and increment the buffer size /// /// This function does not check that buffer has the required capacity -void ArrowBufferAppendUnsafe(struct ArrowBuffer* buffer, const void* data, - int64_t size_bytes); +static inline void ArrowBufferAppendUnsafe(struct ArrowBuffer* buffer, const void* data, + int64_t size_bytes); /// \brief Write data to buffer and increment the buffer size /// /// This function writes and ensures that the buffer has the required capacity, /// possibly by reallocating the buffer. Like ArrowBufferReserve, this will /// overallocate when reallocation is required. -ArrowErrorCode ArrowBufferAppend(struct ArrowBuffer* buffer, const void* data, - int64_t size_bytes); +static inline ArrowErrorCode ArrowBufferAppend(struct ArrowBuffer* buffer, + const void* data, int64_t size_bytes); + +/// }@ + +/// \defgroup nanoarrow-bitmap Bitmap utilities + +/// \brief Extract a boolean value from a bitmap +static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i); Review Comment: When I stole Arrow's implementations I also stole all the names I saw! The raw bit functions became `ArrowBitXXX` and the functions that operate on an owning `struct ArrowBitmap` became `ArrowBitmapXXX` to match the `ArrowBuffer` functions... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938864854 ## src/nanoarrow/bitmap_inline.h: ## @@ -0,0 +1,131 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED +#define NANOARROW_BITMAP_INLINE_H_INCLUDED + +#include +#include + +#include "buffer_inline.h" +#include "typedefs_inline.h" + +static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) { + const int8_t* bitmap_char = (const int8_t*)bitmap; + return 0 != (bitmap_char[i / 8] & ((int8_t)0x01) << (i % 8)); +} + +static inline void ArrowBitmapSetElement(void* bitmap, int64_t i, int8_t value) { + int8_t* bitmap_char = (int8_t*)bitmap; + int8_t mask = 0x01 << (i % 8); + if (value) { +bitmap_char[i / 8] |= mask; + } else { +bitmap_char[i / 8] &= ~mask; + } +} + +static inline int64_t ArrowBitmapCountTrue(const void* bitmap, int64_t i_from, + int64_t i_to) { + int64_t count = 0; + for (int64_t i = i_from; i < i_to; i++) { +count += ArrowBitmapElement(bitmap, i); Review Comment: I passed on the compiler intrinsics for now because I don't have CI to test multiple compilers and make sure that they work or benchmarks set up to make sure they're worth it...I used the pre-computed `kpopcount` array which is much better than the previous version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder
paleolimbot commented on code in PR #10: URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938858642 ## src/nanoarrow/bitmap_inline.h: ## @@ -0,0 +1,131 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED +#define NANOARROW_BITMAP_INLINE_H_INCLUDED + +#include +#include + +#include "buffer_inline.h" +#include "typedefs_inline.h" + +static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) { Review Comment: Done! At least to the extent needed to bitpack a char (DuckDB), an int (R), append a bunch of nulls at once, and calculate a null count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17320) Refine pyarrow.parquet API exposure
Miles Granger created ARROW-17320: - Summary: Refine pyarrow.parquet API exposure Key: ARROW-17320 URL: https://issues.apache.org/jira/browse/ARROW-17320 Project: Apache Arrow Issue Type: Improvement Components: Parquet, Python Reporter: Miles Granger Spawning from [ARROW-17106|https://issues.apache.org/jira/browse/ARROW-17106], moving code from `pyarrow/parquet/__init__` to `pyarrow/parquet/core` and re-exporting in `__init__` to maintain the same functionality. [pyarrow.__init__|https://github.com/apache/arrow/blob/master/python/pyarrow/__init__.py] is very careful about what is exposed through the public API by prefixing private symbols with underscores, even imports. What's exposed at the top level of `{{{}pyarrow.parquet{}}}`, however, is not so careful. API calls such as `{{{}pq.FileSystem{}}}`, `{{{}pq.pa.Array{}}}`, `{{{}pq.json{}}}` are all valid and should probably be designated as private attributes in {{{}pyarrow.parquet{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17319) pyarrow seems to set default CPU affinity to 0 on shutdown, crashes if CPU 0 is not available
Mike Gevaert created ARROW-17319: Summary: pyarrow seems to set default CPU affinity to 0 on shutdown, crashes if CPU 0 is not available Key: ARROW-17319 URL: https://issues.apache.org/jira/browse/ARROW-17319 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 9.0.0 Environment: Ubuntu 20.02 / Python 3.8.10 (default, Jun 22 2022, 20:18:18) $ pip list Package Version --- --- numpy 1.23.1 pandas 1.4.3 pip 20.0.2 pkg-resources 0.0.0 pyarrow 9.0.0 python-dateutil 2.8.2 pytz2022.1 setuptools 44.0.0 six 1.16.0 Reporter: Mike Gevaert I get the following traceback when exiting python after loading {{pyarrow.parquet}} {code} Python 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> os.getpid() 25106 >>> import pyarrow.parquet >>> Fatal error condition occurred in /opt/vcpkg/buildtrees/aws-c-io/src/9e6648842a-364b708815.clean/source/event_loop.c:72: aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, el_group, &thread_options) == AWS_OP_SUCCESS Exiting Application Stack trace: /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200af06) [0x7f831b2b3f06] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x20028e5) [0x7f831b2ab8e5] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f27e09) [0x7f831b1d0e09] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f831b2b4a3d] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f25948) [0x7f831b1ce948] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) [0x7f831b2b4a3d] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1ee0b46) [0x7f831b189b46] /tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x194546a) [0x7f831abee46a] /lib/x86_64-linux-gnu/libc.so.6(+0x468a7) [0x7f831c6188a7] /lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f831c618a60] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7f831c5f608a] {code} To replicate this; one needs to make sure that CPU 0 isn't available to schedule tasks on. In HPC our environment, that happens due to slurm using cgroups to constrain CPU usage. On a linux workstation, one should be able to: 1) open python as a normal user 2) get the pid 3) as root: {code} cd /sys/fs/cgroup/cpuset/ mkdir pyarrow cd pyarrow echo 0 > cpuset.mems echo 1 > cpuset.cpus # sets the cgroup to only have access to cpu 1 echo $PID > tasks {code} Then, in the python enviroment: {code} import pyarrow.parquet exit() {code} Which should trigger the crash. Sadly, I couldn't track down which {{aws-c-common}} and {{aws-c-io}} are being used for the 9.0.0 py38 manylinux wheels. (libarrow.so.900 has BuildID[sha1]=dd6c5a2efd5cacf09657780a58c40f7c930e4df1) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17318) [C++][Dataset] Support async streaming interface for getting fragments in Dataset
Pavel Solodovnikov created ARROW-17318: -- Summary: [C++][Dataset] Support async streaming interface for getting fragments in Dataset Key: ARROW-17318 URL: https://issues.apache.org/jira/browse/ARROW-17318 Project: Apache Arrow Issue Type: Sub-task Reporter: Pavel Solodovnikov Assignee: Pavel Solodovnikov Add `GetFragmentsAsync()` and `GetFragmentsAsyncImpl()` functions to the generic `Dataset` interface, which allows to produce fragments in a streamed fashion. This is one of the prerequisites for making `FileSystemDataset` to support lazy fragment processing, which, in turn, can be used to start scan operations without waiting for the entire dataset to be discovered. To aid the transition process of moving to async implementation in `Dataset`/`AsyncScanner` code, a default implementation for `GetFragmentsAsyncImpl()` should be provided (yielding a VectorGenerator over the fragments vector, which is stored by every implementation of Dataset interface at the moment). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17317) [Release][Docs] Normalize previous document version directory
Kouhei Sutou created ARROW-17317: Summary: [Release][Docs] Normalize previous document version directory Key: ARROW-17317 URL: https://issues.apache.org/jira/browse/ARROW-17317 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Fix For: 10.0.0 We should use X.Y instead of X.Y.Z (e.g.: 8.0 not 8.0.1) for previous version document directory. See also: https://github.com/apache/arrow/blob/apache-arrow-9.0.0/dev/release/post-08-docs.sh#L84 The script should accept X.Y.Z such as 8.0.1 and normalize it to X.Y. It'll reduce human error. See also: * https://github.com/apache/arrow-site/pull/228#issuecomment-1205997067 * https://github.com/apache/arrow-site/pull/228#issuecomment-1206085602 -- This message was sent by Atlassian Jira (v8.20.10#820010)