[GitHub] [arrow-adbc] dependabot[bot] opened a new pull request, #54: Bump postgresql from 42.4.0 to 42.4.1 in /java/driver/jdbc-validation-postgresql

2022-08-05 Thread GitBox


dependabot[bot] opened a new pull request, #54:
URL: https://github.com/apache/arrow-adbc/pull/54

   Bumps [postgresql](https://github.com/pgjdbc/pgjdbc) from 42.4.0 to 42.4.1.
   
   Changelog
   Sourced from https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md";>postgresql's 
changelog.
   
   Changelog
   Notable changes since version 42.0.0, read the complete https://jdbc.postgresql.org/documentation/changelog.html";>History of 
Changes.
   The format is based on http://keepachangelog.com/en/1.0.0/";>Keep 
a Changelog.
   [Unreleased]
   Changed
   Added
   Fixed
   [42.4.1] (2022-08-01 16:24:20 -0400)
   Security
   
   fix: CVE-2022-31197 Fixes SQL generated in PgResultSet.refresh() to 
escape column identifiers so as to prevent SQL injection.
   
   Previously, the column names for both key and data columns in the table 
were copied as-is into the generated
   SQL. This allowed a malicious table with column names that include statement 
terminator to be parsed and
   executed as multiple separate commands.
   Also adds a new test class ResultSetRefreshTest to verify this 
change.
   Reported by https://github.com/kato-sho";>Sho Kato
   
   
   
   Changed
   
   chore: skip publishing pgjdbc-osgi-test to Central
   chore: bump Gradle to 7.5
   test: update JUnit to 5.8.2
   
   Added
   
   chore: added Gradle Wrapper Validation for verifying 
gradle-wrapper.jar
   chore: added "permissions: contents: read" for GitHub Actions 
to avoid unintentional modifications by the CI
   chore: support building pgjdbc with Java 17
   
   Fixed
   
   
   
   Commits
   
   https://github.com/pgjdbc/pgjdbc/commit/bd91c4cc76cdfc1ffd0322be80c85ddfe08a38c2";>bd91c4c
 Prepare for release (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2580";>#2580)
   https://github.com/pgjdbc/pgjdbc/commit/739e599d52ad80f8dcd6efedc6157859b1a9d637";>739e599
 Merge pull request from GHSA-r38f-c4h4-hqq2
   https://github.com/pgjdbc/pgjdbc/commit/736f9598c5b32a19c645ad33f118d2c9c266e90e";>736f959
 fix: replace syncronization in Connection.close with compareAndSet
   https://github.com/pgjdbc/pgjdbc/commit/4673fd271c63a24b2a363149945187bad911888a";>4673fd2
 feat: synchronize statement executions (e.g. avoid deadlock when 
Connection.i...
   https://github.com/pgjdbc/pgjdbc/commit/fd31a06f9c64a2ad69ce274de99ec31d0e1c3b6d";>fd31a06
 update the website content (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2578";>#2578)
   https://github.com/pgjdbc/pgjdbc/commit/a6044d05b80e1bda2fbe2f4e6bd0a714b8e74030";>a6044d0
 set a timeout to get the return from requesting SSL upgrade. (https://github-redirect.dependabot.com/pgjdbc/pgjdbc/issues/2572";>#2572)
   https://github.com/pgjdbc/pgjdbc/commit/58d6fa085fef483d5f972146c9e7e8f805d144d9";>58d6fa0
 test: bump system-stubs-jupiter to 2.0.1 to support Java 16+
   https://github.com/pgjdbc/pgjdbc/commit/b452d8c6d16ffdcd79495e5857ce9ba37bd8a87b";>b452d8c
 test: avoid concurrent executions of tests that update environment and 
system...
   https://github.com/pgjdbc/pgjdbc/commit/aa5758a18893ced9c1b20655be6042444d746440";>aa5758a
 test: update JUnit to 5.8.2
   https://github.com/pgjdbc/pgjdbc/commit/36cd24c300118c36a8b408665118a1f83b82751d";>36cd24c
 fix: log connection URL when it can't be parsed
   Additional commits viewable in https://github.com/pgjdbc/pgjdbc/compare/REL42.4.0...REL42.4.1";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.postgresql:postgresql&package-manager=maven&previous-version=42.4.0&new-version=42.4.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close thi

[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #14: Owning/mutable `struct ArrowArray`

2022-08-05 Thread GitBox


lidavidm commented on code in PR #14:
URL: https://github.com/apache/arrow-nanoarrow/pull/14#discussion_r939166568


##
src/nanoarrow/typedefs_inline.h:
##
@@ -165,6 +212,20 @@ struct ArrowBitmap {
   int64_t size_bits;
 };
 
+/// \brief A structure used as the private data member for ArrowArrays 
allocated here

Review Comment:
   nit: does this need to be in the public header?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #12: Add metadata builder functions

2022-08-05 Thread GitBox


lidavidm commented on code in PR #12:
URL: https://github.com/apache/arrow-nanoarrow/pull/12#discussion_r939160612


##
src/nanoarrow/nanoarrow.h:
##
@@ -261,6 +261,24 @@ ArrowErrorCode ArrowMetadataGetValue(const char* metadata, 
const char* key,
  const char* default_value,
  struct ArrowStringView* value_out);
 
+/// \brief Initialize a builder for schema metadata from key/value pairs
+ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer, const 
char* metadata);

Review Comment:
   The `metadata` param is an existing metadata buffer? (It's also not tested)



##
src/nanoarrow/metadata.c:
##
@@ -114,8 +114,156 @@ ArrowErrorCode ArrowMetadataGetValue(const char* 
metadata, const char* key,
   return NANOARROW_OK;
 }
 
+ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key,
+ const char* default_value,
+ struct ArrowStringView* value_out) {
+  struct ArrowStringView key_view = {key, strlen(key)};
+  return ArrowMetadataGetValueView(metadata, &key_view, default_value, 
value_out);
+}
+
 char ArrowMetadataHasKey(const char* metadata, const char* key) {
   struct ArrowStringView value;
   ArrowMetadataGetValue(metadata, key, NULL, &value);
   return value.data != NULL;
 }
+
+ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer,
+const char* metadata) {
+  ArrowBufferInit(buffer);
+  int result = ArrowBufferAppend(buffer, metadata, 
ArrowMetadataSizeOf(metadata));
+  if (result != NANOARROW_OK) {
+return result;
+  }
+
+  return NANOARROW_OK;
+}
+
+ArrowErrorCode ArrowMetadataBuilderAppendView(struct ArrowBuffer* buffer,
+  struct ArrowStringView* key,
+  struct ArrowStringView* value) {
+  if (value == NULL) {
+return NANOARROW_OK;
+  }

Review Comment:
   Hmm, to me it's a little weird to accept NULL as the value and then just do 
nothing with it. If we just considered `append(key, NULL)` to be an error, we 
could drop this, and then we could pass the views by value instead of 
indirecting through a pointer



##
src/nanoarrow/metadata.c:
##
@@ -114,8 +114,156 @@ ArrowErrorCode ArrowMetadataGetValue(const char* 
metadata, const char* key,
   return NANOARROW_OK;
 }
 
+ArrowErrorCode ArrowMetadataGetValue(const char* metadata, const char* key,
+ const char* default_value,
+ struct ArrowStringView* value_out) {
+  struct ArrowStringView key_view = {key, strlen(key)};
+  return ArrowMetadataGetValueView(metadata, &key_view, default_value, 
value_out);
+}
+
 char ArrowMetadataHasKey(const char* metadata, const char* key) {
   struct ArrowStringView value;
   ArrowMetadataGetValue(metadata, key, NULL, &value);
   return value.data != NULL;
 }
+
+ArrowErrorCode ArrowMetadataBuilderInit(struct ArrowBuffer* buffer,
+const char* metadata) {
+  ArrowBufferInit(buffer);
+  int result = ArrowBufferAppend(buffer, metadata, 
ArrowMetadataSizeOf(metadata));
+  if (result != NANOARROW_OK) {
+return result;
+  }
+
+  return NANOARROW_OK;
+}
+
+ArrowErrorCode ArrowMetadataBuilderAppendView(struct ArrowBuffer* buffer,

Review Comment:
   Worth possibly exposing this variant too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17328) [C++] Add hash_mode function

2022-08-05 Thread Ian Cook (Jira)
Ian Cook created ARROW-17328:


 Summary: [C++] Add hash_mode function
 Key: ARROW-17328
 URL: https://issues.apache.org/jira/browse/ARROW-17328
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Ian Cook


Arrow currently has a {{mode}} kernel but no {{hash_mode}} kernel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17327) Parquet should be listed in PyArrow's get_libraries() function

2022-08-05 Thread Steven Silvester (Jira)
Steven Silvester created ARROW-17327:


 Summary: Parquet should be listed in PyArrow's get_libraries() 
function
 Key: ARROW-17327
 URL: https://issues.apache.org/jira/browse/ARROW-17327
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Steven Silvester


We are updating {{PyMongoArrow}} to use PyArrow 8.0, and saw the following 
[failure| 
https://github.com/mongodb-labs/mongo-arrow/runs/7696619223?check_suite_focus=true]
 when building wheels:  "@rpath/libparquet.800.dylib not found".

We overcame the error by explicitly adding "parquet" to the list of libraries 
returned by {{get_libraries}}.  I am happy to submit a PR.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] paleolimbot opened a new pull request, #14: Owning/mutable `struct ArrowArray`

2022-08-05 Thread GitBox


paleolimbot opened a new pull request, #14:
URL: https://github.com/apache/arrow-nanoarrow/pull/14

   Fixes #5 by implementing an Array whose buffer lifecycle is handled by 
`struct ArrowBuffer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] codecov-commenter commented on pull request #12: Add metadata builder functions

2022-08-05 Thread GitBox


codecov-commenter commented on PR #12:
URL: https://github.com/apache/arrow-nanoarrow/pull/12#issuecomment-1206757501

   # 
[Codecov](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#12](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (66073ee) into 
[main](https://codecov.io/gh/apache/arrow-nanoarrow/commit/51e5052ddd08fb424d8c20c86f9d5ea7d7b4ff51?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 (51e5052) will **decrease** coverage by `1.77%`.
   > The diff coverage is `75.94%`.
   
   ```diff
   @@Coverage Diff @@
   ## main  #12  +/-   ##
   ==
   - Coverage   91.97%   90.20%   -1.78% 
   ==
 Files   56   +1 
 Lines 798  919 +121 
 Branches   30   38   +8 
   ==
   + Hits  734  829  +95 
   - Misses 41   59  +18 
   - Partials   23   31   +8 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[src/nanoarrow/metadata.c](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9tZXRhZGF0YS5j)
 | `85.03% <75.94%> (-14.97%)` | :arrow_down: |
   | 
[src/nanoarrow/buffer\_inline.h](https://codecov.io/gh/apache/arrow-nanoarrow/pull/12/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9idWZmZXJfaW5saW5lLmg=)
 | `84.78% <0.00%> (ø)` | |
   
   :mega: Codecov can now indicate which changes are the most critical in Pull 
Requests. [Learn 
more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #11: Inline performance-sensitive functions and their dependencies

2022-08-05 Thread GitBox


paleolimbot commented on issue #11:
URL: https://github.com/apache/arrow-nanoarrow/issues/11#issuecomment-1206753929

   Fixed in #10 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot closed issue #11: Inline performance-sensitive functions and their dependencies

2022-08-05 Thread GitBox


paleolimbot closed issue #11: Inline performance-sensitive functions and their 
dependencies
URL: https://github.com/apache/arrow-nanoarrow/issues/11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot merged PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot closed issue #4: Implement bitmap helpers

2022-08-05 Thread GitBox


paleolimbot closed issue #4: Implement bitmap helpers
URL: https://github.com/apache/arrow-nanoarrow/issues/4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


lidavidm commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939046140


##
src/nanoarrow/buffer_inline.h:
##
@@ -15,14 +15,20 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED
+#define NANOARROW_BUFFER_INLINE_H_INCLUDED
+
 #include 
-#include 
-#include 
+#include 
 #include 
 
-#include "nanoarrow.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 
-static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {
+static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {

Review Comment:
   Ah, interesting. I agree it's probably safe. It wouldn't be an issue if we 
have to change it later for some reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939045503


##
src/nanoarrow/buffer_inline.h:
##
@@ -15,14 +15,20 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED
+#define NANOARROW_BUFFER_INLINE_H_INCLUDED
+
 #include 
-#include 
-#include 
+#include 
 #include 
 
-#include "nanoarrow.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 
-static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {
+static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {

Review Comment:
   Maybe `ArrowPrivateXXX`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939043459


##
src/nanoarrow/buffer_inline.h:
##
@@ -15,14 +15,20 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED
+#define NANOARROW_BUFFER_INLINE_H_INCLUDED
+
 #include 
-#include 
-#include 
+#include 
 #include 
 
-#include "nanoarrow.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 
-static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {
+static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {

Review Comment:
   I see...I was copying the pattern used by headers generated by nanopb 
("private" inline functions). Is there a better pattern for functions that have 
to be visible for inline functions but that shouldn't be accessed otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] pitrou commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


pitrou commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939041016


##
src/nanoarrow/buffer_inline.h:
##
@@ -15,14 +15,20 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED
+#define NANOARROW_BUFFER_INLINE_H_INCLUDED
+
 #include 
-#include 
-#include 
+#include 
 #include 
 
-#include "nanoarrow.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 
-static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {
+static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {

Review Comment:
   It's used in many C projects though, so most probably can be considered safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17326) [Go][FlightSQL] Add Support for FlightSQL to Go

2022-08-05 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-17326:
-

 Summary: [Go][FlightSQL] Add Support for FlightSQL to Go
 Key: ARROW-17326
 URL: https://issues.apache.org/jira/browse/ARROW-17326
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Matthew Topol
Assignee: Matthew Topol


Also addresses https://github.com/apache/arrow/issues/12496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17325) AQE should use available column statistics from completed query stages

2022-08-05 Thread Andy Grove (Jira)
Andy Grove created ARROW-17325:
--

 Summary: AQE should use available column statistics from completed 
query stages
 Key: ARROW-17325
 URL: https://issues.apache.org/jira/browse/ARROW-17325
 Project: Apache Arrow
  Issue Type: Improvement
  Components: SQL
Reporter: Andy Grove


In QueryStageExec.computeStats we copy partial statistics from materlized query 
stages by calling QueryStageExec#getRuntimeStatistics, which in turn calls 
ShuffleExchangeLike#runtimeStatistics or 
BroadcastExchangeLike#runtimeStatistics.

 

Only dataSize and numOutputRows are copied into the new Statistics object:

 {code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
    val runtimeStats = getRuntimeStatistics
    val dataSize = runtimeStats.sizeInBytes.max(0)
    val numOutputRows = runtimeStats.rowCount.map(_.max(0))
    Some(Statistics(dataSize, numOutputRows, isRuntime = true))
  } else {
    None
  }
{code}

I would like to also copy over the column statistics stored in 
Statistics.attributeMap so that they can be fed back into the logical plan 
optimization phase.

The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do 
not currently provide such column statistics but other custom implementations 
can.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


lidavidm commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r939027329


##
src/nanoarrow/bitmap_inline.h:
##
@@ -0,0 +1,323 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED
+#define NANOARROW_BITMAP_INLINE_H_INCLUDED
+
+#include 
+#include 
+
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+static const uint8_t _ArrowkBitmask[] = {1, 2, 4, 8, 16, 32, 64, 128};
+static const uint8_t _ArrowkFlippedBitmask[] = {254, 253, 251, 247, 239, 223, 
191, 127};
+static const uint8_t _ArrowkPrecedingBitmask[] = {0, 1, 3, 7, 15, 31, 63, 127};
+static const uint8_t _ArrowkTrailingBitmask[] = {255, 254, 252, 248, 240, 224, 
192, 128};

Review Comment:
   Ditto the comment about underscores in names here (unfortunately).



##
src/nanoarrow/buffer_inline.h:
##
@@ -15,14 +15,20 @@
 // specific language governing permissions and limitations
 // under the License.
 
+#ifndef NANOARROW_BUFFER_INLINE_H_INCLUDED
+#define NANOARROW_BUFFER_INLINE_H_INCLUDED
+
 #include 
-#include 
-#include 
+#include 
 #include 
 
-#include "nanoarrow.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 
-static int64_t ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {
+static inline int64_t _ArrowGrowByFactor(int64_t current_capacity, int64_t 
new_capacity) {

Review Comment:
   It's not allowed to start names with an underscore: 
https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17324) [Go][CI] Add new Go CI job with -asan

2022-08-05 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-17324:
-

 Summary: [Go][CI] Add new Go CI job with -asan
 Key: ARROW-17324
 URL: https://issues.apache.org/jira/browse/ARROW-17324
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Go
Reporter: Matthew Topol


go1.18 added a "-asan" build option to leverage an equivalent to Address 
Sanitizer in C++. Currently we only build the Go code and run tests using 
go1.16 which does not have the "-asan" option.

Since we want to maintain the backwards compatibility and not yet upgrade to 
go1.18, we should create a new job that runs the tests using go1.18 and the 
"-asan" option to perform additional safety checking.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17323) [Go] Clean up and upgrade dependencies

2022-08-05 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-17323:
-

 Summary: [Go] Clean up and upgrade dependencies
 Key: ARROW-17323
 URL: https://issues.apache.org/jira/browse/ARROW-17323
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Matthew Topol
Assignee: Matthew Topol
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17322) [Docs] Add issue handling guidance to docs

2022-08-05 Thread Todd Farmer (Jira)
Todd Farmer created ARROW-17322:
---

 Summary: [Docs] Add issue handling guidance to docs
 Key: ARROW-17322
 URL: https://issues.apache.org/jira/browse/ARROW-17322
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Todd Farmer


Per [this mailing list 
discussion|https://lists.apache.org/thread/6crmd1qp093gk1s3l2sjdy88qoqym409], 
it is proposed that the following policies be adopted and documented relative 
to issue handling:
 * Issues should be assigned only when they are being actively worked, or 
expected to be worked in the immediate future. Assigned issues that have not 
been updated in past 90 days should be reverted to unassigned.
 * All issues "In Progress" required an assignee. Any unassigned issue in "In 
Progress" status should be reverted to "Open" status.
 * Expected usage of issue status and resolution fields should be documented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17321) Update dependencies

2022-08-05 Thread Dominik Moritz (Jira)
Dominik Moritz created ARROW-17321:
--

 Summary: Update dependencies
 Key: ARROW-17321
 URL: https://issues.apache.org/jira/browse/ARROW-17321
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: 9.0.0
Reporter: Dominik Moritz
Assignee: Dominik Moritz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #13: Add coverage badge back (and nudge CI to upload a report so we get PR coverage diffs)

2022-08-05 Thread GitBox


paleolimbot merged PR #13:
URL: https://github.com/apache/arrow-nanoarrow/pull/13


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #8: Implement element-wise appenders for `struct ArrowArray`s that we allocated

2022-08-05 Thread GitBox


paleolimbot commented on issue #8:
URL: https://github.com/apache/arrow-nanoarrow/issues/8#issuecomment-1206564623

   That's an excellent point, and David's "bag of buffers" comment makes a lot 
of sense. Type-specific appenders are definitely the way to go and I think 
after #5 we'll have what it takes to make the "accumulate a record batch from a 
schema defined at runtime" workflow a thing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1206553114

   Ok, I think I have this with syntax and feature parity with the `struct 
ArrowBuffer` (in preparation for defining an owning `struct ArrowArray` that is 
a `struct ArrowBitmap` + a 3 `struct ArrowBuffer`s).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1206514130

   I see...I'd been using it to simplify the append process, but the right 
thing to do is to properly bitpack-as-you-append (which is now implemented) so 
that the `ArrowBufferXXX()` functions get called in the same order as the 
`ArrowBitmapXXX()` functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938856922


##
src/nanoarrow/nanoarrow.h:
##
@@ -483,82 +372,117 @@ ArrowErrorCode ArrowSchemaViewInit(struct 
ArrowSchemaView* schema_view,
 
 /// }@
 
-/// \defgroup nanoarrow-buffer-builder Growable buffer builders
-
-/// \brief An owning mutable view of a buffer
-struct ArrowBuffer {
-  /// \brief A pointer to the start of the buffer
-  ///
-  /// If capacity_bytes is 0, this value may be NULL.
-  uint8_t* data;
-
-  /// \brief The size of the buffer in bytes
-  int64_t size_bytes;
-
-  /// \brief The capacity of the buffer in bytes
-  int64_t capacity_bytes;
-
-  /// \brief The allocator that will be used to reallocate and/or free the 
buffer
-  struct ArrowBufferAllocator* allocator;
-};
+/// \defgroup nanoarrow-buffer Owning, growable buffers
 
 /// \brief Initialize an ArrowBuffer
 ///
 /// Initialize a buffer with a NULL, zero-size buffer using the default
 /// buffer allocator.
-void ArrowBufferInit(struct ArrowBuffer* buffer);
+static inline void ArrowBufferInit(struct ArrowBuffer* buffer);
 
 /// \brief Set a newly-initialized buffer's allocator
 ///
 /// Returns EINVAL if the buffer has already been allocated.
-ArrowErrorCode ArrowBufferSetAllocator(struct ArrowBuffer* buffer,
-   struct ArrowBufferAllocator* allocator);
+static inline ArrowErrorCode ArrowBufferSetAllocator(
+struct ArrowBuffer* buffer, struct ArrowBufferAllocator* allocator);
 
 /// \brief Reset an ArrowBuffer
 ///
 /// Releases the buffer using the allocator's free method if
 /// the buffer's data member is non-null, sets the data member
 /// to NULL, and sets the buffer's size and capacity to 0.
-void ArrowBufferReset(struct ArrowBuffer* buffer);
+static inline void ArrowBufferReset(struct ArrowBuffer* buffer);
 
 /// \brief Move an ArrowBuffer
 ///
 /// Transfers the buffer data and lifecycle management to another
 /// address and resets buffer.
-void ArrowBufferMove(struct ArrowBuffer* buffer, struct ArrowBuffer* 
buffer_out);
+static inline void ArrowBufferMove(struct ArrowBuffer* buffer,
+   struct ArrowBuffer* buffer_out);
 
 /// \brief Grow or shrink a buffer to a given capacity
 ///
 /// When shrinking the capacity of the buffer, the buffer is only reallocated
 /// if shrink_to_fit is non-zero. Calling ArrowBufferResize() does not
 /// adjust the buffer's size member except to ensure that the invariant
 /// capacity >= size remains true.
-ArrowErrorCode ArrowBufferResize(struct ArrowBuffer* buffer, int64_t 
new_capacity_bytes,
- char shrink_to_fit);
+static inline ArrowErrorCode ArrowBufferResize(struct ArrowBuffer* buffer,
+   int64_t new_capacity_bytes,
+   char shrink_to_fit);
 
 /// \brief Ensure a buffer has at least a given additional capacity
 ///
 /// Ensures that the buffer has space to append at least
 /// additional_size_bytes, overallocating when required.
-ArrowErrorCode ArrowBufferReserve(struct ArrowBuffer* buffer,
-  int64_t additional_size_bytes);
+static inline ArrowErrorCode ArrowBufferReserve(struct ArrowBuffer* buffer,
+int64_t additional_size_bytes);
 
 /// \brief Write data to buffer and increment the buffer size
 ///
 /// This function does not check that buffer has the required capacity
-void ArrowBufferAppendUnsafe(struct ArrowBuffer* buffer, const void* data,
- int64_t size_bytes);
+static inline void ArrowBufferAppendUnsafe(struct ArrowBuffer* buffer, const 
void* data,
+   int64_t size_bytes);
 
 /// \brief Write data to buffer and increment the buffer size
 ///
 /// This function writes and ensures that the buffer has the required capacity,
 /// possibly by reallocating the buffer. Like ArrowBufferReserve, this will
 /// overallocate when reallocation is required.
-ArrowErrorCode ArrowBufferAppend(struct ArrowBuffer* buffer, const void* data,
- int64_t size_bytes);
+static inline ArrowErrorCode ArrowBufferAppend(struct ArrowBuffer* buffer,
+   const void* data, int64_t 
size_bytes);
+
+/// }@
+
+/// \defgroup nanoarrow-bitmap Bitmap utilities
+
+/// \brief Extract a boolean value from a bitmap
+static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i);

Review Comment:
   When I stole Arrow's implementations I also stole all the names I saw! The 
raw bit functions became `ArrowBitXXX` and the functions that operate on an 
owning `struct ArrowBitmap` became `ArrowBitmapXXX` to match the `ArrowBuffer` 
functions...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938864854


##
src/nanoarrow/bitmap_inline.h:
##
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED
+#define NANOARROW_BITMAP_INLINE_H_INCLUDED
+
+#include 
+#include 
+
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) {
+  const int8_t* bitmap_char = (const int8_t*)bitmap;
+  return 0 != (bitmap_char[i / 8] & ((int8_t)0x01) << (i % 8));
+}
+
+static inline void ArrowBitmapSetElement(void* bitmap, int64_t i, int8_t 
value) {
+  int8_t* bitmap_char = (int8_t*)bitmap;
+  int8_t mask = 0x01 << (i % 8);
+  if (value) {
+bitmap_char[i / 8] |= mask;
+  } else {
+bitmap_char[i / 8] &= ~mask;
+  }
+}
+
+static inline int64_t ArrowBitmapCountTrue(const void* bitmap, int64_t i_from,
+   int64_t i_to) {
+  int64_t count = 0;
+  for (int64_t i = i_from; i < i_to; i++) {
+count += ArrowBitmapElement(bitmap, i);

Review Comment:
   I passed on the compiler intrinsics for now because I don't have CI to test 
multiple compilers and make sure that they work or benchmarks set up to make 
sure they're worth it...I used the pre-computed `kpopcount` array which is much 
better than the previous version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-05 Thread GitBox


paleolimbot commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r938858642


##
src/nanoarrow/bitmap_inline.h:
##
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED
+#define NANOARROW_BITMAP_INLINE_H_INCLUDED
+
+#include 
+#include 
+
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) {

Review Comment:
   Done! At least to the extent needed to bitpack a char (DuckDB), an int (R), 
append a bunch of nulls at once, and calculate a null count.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17320) Refine pyarrow.parquet API exposure

2022-08-05 Thread Miles Granger (Jira)
Miles Granger created ARROW-17320:
-

 Summary: Refine pyarrow.parquet API exposure
 Key: ARROW-17320
 URL: https://issues.apache.org/jira/browse/ARROW-17320
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Parquet, Python
Reporter: Miles Granger


Spawning from [ARROW-17106|https://issues.apache.org/jira/browse/ARROW-17106], 
moving code from `pyarrow/parquet/__init__` to `pyarrow/parquet/core` and 
re-exporting in `__init__` to maintain the same functionality.


[pyarrow.__init__|https://github.com/apache/arrow/blob/master/python/pyarrow/__init__.py]
 is very careful about what is exposed through the public API by prefixing 
private symbols with underscores, even imports. 

What's exposed at the top level of `{{{}pyarrow.parquet{}}}`, however, is not 
so careful. API calls such as `{{{}pq.FileSystem{}}}`, `{{{}pq.pa.Array{}}}`, 
`{{{}pq.json{}}}` are all valid and should probably be designated as private 
attributes in {{{}pyarrow.parquet{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17319) pyarrow seems to set default CPU affinity to 0 on shutdown, crashes if CPU 0 is not available

2022-08-05 Thread Mike Gevaert (Jira)
Mike Gevaert created ARROW-17319:


 Summary: pyarrow seems to set default CPU affinity to 0 on 
shutdown, crashes if CPU 0 is not available
 Key: ARROW-17319
 URL: https://issues.apache.org/jira/browse/ARROW-17319
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 9.0.0
 Environment: Ubuntu 20.02 / Python 3.8.10 (default, Jun 22 2022, 
20:18:18)

$ pip list 
Package Version
--- ---
numpy   1.23.1 
pandas  1.4.3  
pip 20.0.2 
pkg-resources   0.0.0  
pyarrow 9.0.0  
python-dateutil 2.8.2  
pytz2022.1 
setuptools  44.0.0 
six 1.16.0 
Reporter: Mike Gevaert


I get the following traceback when exiting python after loading 
{{pyarrow.parquet}}

{code}
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> os.getpid()
25106
>>> import pyarrow.parquet
>>> 
Fatal error condition occurred in 
/opt/vcpkg/buildtrees/aws-c-io/src/9e6648842a-364b708815.clean/source/event_loop.c:72:
 aws_thread_launch(&cleanup_thread, s_event_loop_destroy_async_thread_fn, 
el_group, &thread_options) == AWS_OP_SUCCESS
Exiting Application

Stack trace:

/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200af06) 
[0x7f831b2b3f06]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x20028e5) 
[0x7f831b2ab8e5]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f27e09) 
[0x7f831b1d0e09]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) 
[0x7f831b2b4a3d]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1f25948) 
[0x7f831b1ce948]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x200ba3d) 
[0x7f831b2b4a3d]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x1ee0b46) 
[0x7f831b189b46]
/tmp/venv/lib/python3.8/site-packages/pyarrow/libarrow.so.900(+0x194546a) 
[0x7f831abee46a]
/lib/x86_64-linux-gnu/libc.so.6(+0x468a7) [0x7f831c6188a7]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f831c618a60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfa) [0x7f831c5f608a]
 {code}

To replicate this; one needs to make sure that CPU 0 isn't available to 
schedule tasks on.  In HPC our environment, that happens due to slurm using 
cgroups to constrain CPU usage.

On a linux workstation, one should be able to:
1) open python as a normal user
2) get the pid
3) as root:
{code}
cd /sys/fs/cgroup/cpuset/
mkdir pyarrow
cd pyarrow
echo 0 > cpuset.mems
echo 1 > cpuset.cpus # sets the cgroup to only have access to cpu 1
echo $PID > tasks
{code}
Then, in the python enviroment:
{code}
import pyarrow.parquet
exit()
{code}
Which should trigger the crash.

Sadly, I couldn't track down which {{aws-c-common}} and {{aws-c-io}} are being 
used for the 9.0.0 py38 manylinux wheels. (libarrow.so.900 has 
BuildID[sha1]=dd6c5a2efd5cacf09657780a58c40f7c930e4df1)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17318) [C++][Dataset] Support async streaming interface for getting fragments in Dataset

2022-08-05 Thread Pavel Solodovnikov (Jira)
Pavel Solodovnikov created ARROW-17318:
--

 Summary: [C++][Dataset] Support async streaming interface for 
getting fragments in Dataset
 Key: ARROW-17318
 URL: https://issues.apache.org/jira/browse/ARROW-17318
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Pavel Solodovnikov
Assignee: Pavel Solodovnikov


Add `GetFragmentsAsync()` and `GetFragmentsAsyncImpl()` functions to the 
generic `Dataset` interface, which allows to produce fragments in a streamed 
fashion.

This is one of the prerequisites for making `FileSystemDataset` to support lazy 
fragment processing, which, in turn, can be used to start scan operations 
without waiting for the entire dataset to be discovered.

To aid the transition process of moving to async implementation in 
`Dataset`/`AsyncScanner` code, a default implementation for 
`GetFragmentsAsyncImpl()` should be provided (yielding a VectorGenerator over 
the fragments vector, which is stored by every implementation of Dataset 
interface at the moment).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17317) [Release][Docs] Normalize previous document version directory

2022-08-05 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-17317:


 Summary: [Release][Docs] Normalize previous document version 
directory
 Key: ARROW-17317
 URL: https://issues.apache.org/jira/browse/ARROW-17317
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
 Fix For: 10.0.0


We should use X.Y instead of X.Y.Z (e.g.: 8.0 not 8.0.1) for previous version 
document directory.

See also: 
https://github.com/apache/arrow/blob/apache-arrow-9.0.0/dev/release/post-08-docs.sh#L84

The script should accept X.Y.Z such as 8.0.1 and normalize it to X.Y. It'll 
reduce human error.

See also:
* https://github.com/apache/arrow-site/pull/228#issuecomment-1205997067
* https://github.com/apache/arrow-site/pull/228#issuecomment-1206085602



--
This message was sent by Atlassian Jira
(v8.20.10#820010)