[GitHub] [arrow-adbc] paleolimbot commented on a diff in pull request #65: [C] Basic libpq-based driver
paleolimbot commented on code in PR #65: URL: https://github.com/apache/arrow-adbc/pull/65#discussion_r944980230 ## c/drivers/postgres/statement.cc: ## @@ -0,0 +1,283 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#include "statement.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "connection.h" +#include "util.h" + +namespace adbcpq { + +namespace { +/// \brief An ArrowArrayStream that reads tuples from a PGresult. +class TupleReader { + public: + explicit TupleReader(PGresult* result) : result_(result) {} + + int GetSchema(struct ArrowSchema* out) { +std::memset(out, 0, sizeof(*out)); +const int num_fields = PQnfields(result_); +NA_RETURN_NOT_OK(ArrowSchemaInit(out, NANOARROW_TYPE_STRUCT)); +NA_RETURN_NOT_OK(ArrowSchemaAllocateChildren(out, num_fields)); +for (int i = 0; i < num_fields; i++) { + ArrowType field_type = NANOARROW_TYPE_NA; + const Oid pg_type = PQftype(result_, i); + switch (pg_type) { +// TODO: at startup, query pg_type to build up this mapping instead of hardcoding +// it +case 16: // BOOLOID + field_type = NANOARROW_TYPE_BOOL; + break; +case 20: // INT8OID + field_type = NANOARROW_TYPE_INT64; + break; +case 21: // INT2OID + field_type = NANOARROW_TYPE_INT16; + break; +case 23: // INT4OID + field_type = NANOARROW_TYPE_INT32; + break; +default: + last_error_ = + StringBuilder("[libpq] Column #", i + 1, " (\"", PQfname(result_, i), +"\") has unknown type code ", pg_type); + return ENOTSUP; + } + NA_RETURN_NOT_OK(ArrowSchemaInit(out->children[i], field_type)); + NA_RETURN_NOT_OK(ArrowSchemaSetName(out->children[i], PQfname(result_, i))); +} + +NA_RETURN_NOT_OK(ArrowSchemaDeepCopy(out, _)); +return 0; + } + + int GetNext(struct ArrowArray* out) { +if (!result_) { + out->release = nullptr; + return 0; +} + +const int num_rows = PQntuples(result_); + +NA_RETURN_NOT_OK(ArrowArrayInit(out, NANOARROW_TYPE_STRUCT)); +NA_RETURN_NOT_OK(ArrowArrayAllocateChildren(out, schema_.n_children)); + +std::vector fields(schema_.n_children); + +for (int col = 0; col < schema_.n_children; col++) { + NA_RETURN_NOT_OK(ArrowSchemaViewInit([col], schema_.children[col], nullptr)); + NA_RETURN_NOT_OK(ArrowArrayInit(out->children[col], fields[col].data_type)); + NA_RETURN_NOT_OK( + ArrowBitmapReserve(ArrowArrayValidityBitmap(out->children[col]), num_rows)); + switch (fields[col].data_type) { +case NANOARROW_TYPE_INT32: + NA_RETURN_NOT_OK(ArrowBufferReserve(ArrowArrayBuffer(out->children[col], 1), + num_rows * sizeof(int32_t))); + break; +default: + last_error_ = StringBuilder("[libpq] Column #", col + 1, " (\"", + schema_.children[col]->name, + "\") has unsupported type ", fields[col].data_type); + return ENOTSUP; + } +} + +for (int row = 0; row < num_rows; row++) { + for (int col = 0; col < schema_.n_children; col++) { +struct ArrowBitmap* bitmap = ArrowArrayValidityBitmap(out->children[col]); +NA_RETURN_NOT_OK(ArrowBitmapAppend(bitmap, !PQgetisnull(result_, row, col), 1)); + +switch (fields[col].data_type) { + case NANOARROW_TYPE_INT32: { +struct ArrowBuffer* buffer = ArrowArrayBuffer(out->children[col], 1); +// TODO: assert PQgetlength is 4 +NA_RETURN_NOT_OK(ArrowBufferAppendInt32( +buffer, ntohl(*(reinterpret_cast(PQgetvalue(result_, row, col)); +break; + } + default: +last_error_ = StringBuilder( +"[libpq] Column #", col + 1, " (\"", schema_.children[col]->name, +"\") has unsupported type ", fields[col].data_type); +return ENOTSUP; +} + } +} + +for (int
[jira] [Created] (ARROW-17405) [Java] C Data Interface library (.so / .dylib) able to compile with mvn command
David Dali Susanibar Arce created ARROW-17405: - Summary: [Java] C Data Interface library (.so / .dylib) able to compile with mvn command Key: ARROW-17405 URL: https://issues.apache.org/jira/browse/ARROW-17405 Project: Apache Arrow Issue Type: Sub-task Components: Documentation, Java Reporter: David Dali Susanibar Arce Assignee: David Dali Susanibar Arce -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17404) [Java] Consolidate JNI compilation #2
David Dali Susanibar Arce created ARROW-17404: - Summary: [Java] Consolidate JNI compilation #2 Key: ARROW-17404 URL: https://issues.apache.org/jira/browse/ARROW-17404 Project: Apache Arrow Issue Type: Bug Reporter: David Dali Susanibar Arce Assignee: David Dali Susanibar Arce *Umbrella ticket for consolidating Java JNI compilation initiative #2* Initial part of consolidate JNI Java initiative was: [Consolidate ORC/Dataset code|https://issues.apache.org/jira/browse/ARROW-15174] and [Separate JNI CMakeLists.txt compilation|https://issues.apache.org/jira/browse/ARROW-17080]. This 2nd part consist on: * Make the Java library able to compile with a single mvn command * Make Java library able to compile from an installed libarrow * Migrate remaining C++ code specific to Java into the Java project: Gandiva * Add windows build script that produces DLLs * Incorporate Windows DLLs into the maven packages * Migrate JNI to use C-Data-Interface -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17403) [Java] C Data Interface library (.so / .dylib) able to compile with mvn command
David Dali Susanibar Arce created ARROW-17403: - Summary: [Java] C Data Interface library (.so / .dylib) able to compile with mvn command Key: ARROW-17403 URL: https://issues.apache.org/jira/browse/ARROW-17403 Project: Apache Arrow Issue Type: Sub-task Components: Documentation, Java Reporter: David Dali Susanibar Arce Assignee: David Dali Susanibar Arce -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17402) [C++] Improve Dataset Write Option Defaults
Kae Suarez created ARROW-17402: -- Summary: [C++] Improve Dataset Write Option Defaults Key: ARROW-17402 URL: https://issues.apache.org/jira/browse/ARROW-17402 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kae Suarez Currently, for writing a table to disk directly, the defaults are suitable when going to a CSV, IPC, Parquet, etc. However, writing a dataset requires multiple options to be configured, even when defaults could be obvious, e.g., the name for the fragments requires user input, when often they'll be named something like "part\{i}.parquet." Ideally, the defaults should be adequate to write a Dataset, without further user configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-adbc] lidavidm opened a new pull request, #65: [C] Basic libpq-based driver
lidavidm opened a new pull request, #65: URL: https://github.com/apache/arrow-adbc/pull/65 The driver supports basic queries (int32 only) and toggling autocommit. It does not yet support bulk ingestion or prepared statements. It hasn't been optimized for speed and the approach taken here will not be fast (it uses the per-row getters). In future PRs, we should set up some benchmarks and then see if DuckDB's approach makes more sense (use `COPY`). DuckDB also does multithreading (that might be hard for us). We may want to implement #61 first since then we will know whether it is safe to use `COPY` or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm merged pull request #63: [C] Use nanoarrow to improve validation suite
lidavidm merged PR #63: URL: https://github.com/apache/arrow-adbc/pull/63 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #19: ArrowArray consumer buffer helpers
paleolimbot merged PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17401) [C++] Add ReadTable method to RecordBatchFileReader
Will Jones created ARROW-17401: -- Summary: [C++] Add ReadTable method to RecordBatchFileReader Key: ARROW-17401 URL: https://issues.apache.org/jira/browse/ARROW-17401 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 9.0.0 Reporter: Will Jones For convenience, it would be helpful to add an method for just reading the entire file as a table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17400) [C++] Move Parquet APIs to use Result instead of Status
Will Jones created ARROW-17400: -- Summary: [C++] Move Parquet APIs to use Result instead of Status Key: ARROW-17400 URL: https://issues.apache.org/jira/browse/ARROW-17400 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 9.0.0 Reporter: Will Jones Notably, IPC and CSV have "open file" methods that return result, while opening a Parquet file requires passing in an out variable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #19: ArrowArray consumer buffer helpers
paleolimbot commented on code in PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19#discussion_r944665393 ## src/nanoarrow/typedefs_inline.h: ## @@ -166,6 +166,32 @@ enum ArrowType { NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO }; +/// \brief Functional types of buffers as described in the Arrow Columnar Specification +enum ArrowBufferType { + NANOARROW_BUFFER_TYPE_NONE, + NANOARROW_BUFFER_TYPE_VALIDITY, + NANOARROW_BUFFER_TYPE_TYPE_ID, + NANOARROW_BUFFER_TYPE_UNION_OFFSET, + NANOARROW_BUFFER_TYPE_DATA_OFFSET, + NANOARROW_BUFFER_TYPE_DATA +}; + +/// \brief A description of an arrangement of buffers +/// +/// Contains the minimum amount of information required to +/// calculate the size of each buffer in an ArrowArray knowing only +/// the length and offset of the array. +struct ArrowLayout { + /// \brief The function of each buffer + enum ArrowBufferType buffer_type[3]; + + /// \brief The size of an element each buffer or 0 if this size is variable or unknown + int64_t element_size_bits[3]; + + /// \brief The fixed size of a child element + int64_t child_size_elements; Review Comment: It's needed to calculate the length of a child of a fixed-size list (I should clarify + add a test for that, though, to make sure). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #19: ArrowArray consumer buffer helpers
lidavidm commented on code in PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19#discussion_r944654967 ## src/nanoarrow/typedefs_inline.h: ## @@ -166,6 +166,32 @@ enum ArrowType { NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO }; +/// \brief Functional types of buffers as described in the Arrow Columnar Specification +enum ArrowBufferType { + NANOARROW_BUFFER_TYPE_NONE, + NANOARROW_BUFFER_TYPE_VALIDITY, + NANOARROW_BUFFER_TYPE_TYPE_ID, + NANOARROW_BUFFER_TYPE_UNION_OFFSET, + NANOARROW_BUFFER_TYPE_DATA_OFFSET, + NANOARROW_BUFFER_TYPE_DATA +}; + +/// \brief A description of an arrangement of buffers +/// +/// Contains the minimum amount of information required to +/// calculate the size of each buffer in an ArrowArray knowing only +/// the length and offset of the array. +struct ArrowLayout { + /// \brief The function of each buffer + enum ArrowBufferType buffer_type[3]; + + /// \brief The size of an element each buffer or 0 if this size is variable or unknown + int64_t element_size_bits[3]; + + /// \brief The fixed size of a child element + int64_t child_size_elements; Review Comment: How is this different from `element_size_bits`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #19: ArrowArray consumer buffer helpers
paleolimbot commented on code in PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19#discussion_r944649830 ## src/nanoarrow/utils_inline.h: ## @@ -26,6 +26,115 @@ extern "C" { #endif +static inline void ArrowLayoutInit(struct ArrowLayout* layout, Review Comment: Done! I'm sure it *could* be header-only but that's a discussion/battle for another day that I'm not all that qualified to weigh in on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #19: ArrowArray consumer buffer helpers
paleolimbot commented on code in PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19#discussion_r944646506 ## src/nanoarrow/typedefs_inline.h: ## @@ -179,6 +217,19 @@ struct ArrowStringView { int64_t n_bytes; }; +/// \brief An non-owning view of a buffer +struct ArrowBufferView { + /// \brief A pointer to the start of the buffer + /// + /// If n_bytes is 0, this value may be NULL. + const union ArrowBufferDataPointer data; + + /// \brief The size of the string in bytes, + /// + /// (Not including the null terminator.) + int64_t n_bytes; Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm merged pull request #62: [Format][C][Java] Add method to get parameter schema
lidavidm merged PR #62: URL: https://github.com/apache/arrow-adbc/pull/62 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm closed issue #60: [Format] Retrieve expected param binding information
lidavidm closed issue #60: [Format] Retrieve expected param binding information URL: https://github.com/apache/arrow-adbc/issues/60 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface
lidavidm commented on issue #61: URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1213320396 Another reason to differentiate between queries with/without result sets: in a Postgres driver, that means we know when we can attempt to use `COPY (...) TO STDOUT (FORMAT binary)` (akin to DuckDB's integration with Postgres) to get bulk binary data instead of parsing data one row at a time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-nanoarrow] codecov-commenter commented on pull request #19: ArrowArray consumer buffer helpers
codecov-commenter commented on PR #19: URL: https://github.com/apache/arrow-nanoarrow/pull/19#issuecomment-1213263144 # [Codecov](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#19](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (e393e32) into [main](https://codecov.io/gh/apache/arrow-nanoarrow/commit/3b305075d2c4c8489ac0e6288edb58a52a64884d?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (3b30507) will **increase** coverage by `0.44%`. > The diff coverage is `95.37%`. ```diff @@Coverage Diff @@ ## main #19 +/- ## == + Coverage 90.64% 91.09% +0.44% == Files 9 10 +1 Lines1037 1145 +108 Branches 43 46 +3 == + Hits 940 1043 +103 - Misses 63 66 +3 - Partials 34 36 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [src/nanoarrow/array\_view.c](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9hcnJheV92aWV3LmM=) | `82.75% <82.75%> (ø)` | | | [src/nanoarrow/schema\_view.c](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy9zY2hlbWFfdmlldy5j) | `98.88% <100.00%> (+0.01%)` | :arrow_up: | | [src/nanoarrow/utils\_inline.h](https://codecov.io/gh/apache/arrow-nanoarrow/pull/19/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-c3JjL25hbm9hcnJvdy91dGlsc19pbmxpbmUuaA==) | `100.00% <100.00%> (ø)` | | :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] zeroshade commented on pull request #62: [Format][C][Java] Add method to get parameter schema
zeroshade commented on PR #62: URL: https://github.com/apache/arrow-adbc/pull/62#issuecomment-1213241281 :shipit: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17399) pyarrow may use a lot of memory to load a dataframe from parquet
Gianluca Ficarelli created ARROW-17399: -- Summary: pyarrow may use a lot of memory to load a dataframe from parquet Key: ARROW-17399 URL: https://issues.apache.org/jira/browse/ARROW-17399 Project: Apache Arrow Issue Type: Bug Components: Parquet, Python Affects Versions: 9.0.0 Environment: linux Reporter: Gianluca Ficarelli Attachments: memory-profiler.png When a pandas dataframe is loaded from a parquet file using {{{}pyarrow.parquet.read_table{}}}, the memory usage may grow a lot more than what should be needed to load the dataframe, and it's not freed until the dataframe is deleted. The problem is evident when the dataframe has a {*}column containing lists or numpy arrays{*}, while it seems absent (or not noticeable) if the column contains only integer or floats. I'm attaching a simple script to reproduce the issue, and a graph created with memory-profiler showing the memory usage. In this example, the dataframe created with pandas needs around 1.2 GB, but the memory usage after loading it from parquet is around 16 GB. The items of the column are created as numpy arrays and not lists, to be consistent with the types loaded from parquet (pyarrow produces numoy arrays and not lists). {code:python} import gc import time import numpy as np import pandas as pd import pyarrow import pyarrow.parquet import psutil def pyarrow_dump(filename, df, compression="snappy"): table = pyarrow.Table.from_pandas(df) pyarrow.parquet.write_table(table, filename, compression=compression) def pyarrow_load(filename): table = pyarrow.parquet.read_table(filename) return table.to_pandas() def print_mem(msg, start_time=time.monotonic(), process=psutil.Process()): # gc.collect() current_time = time.monotonic() - start_time rss = process.memory_info().rss / 2 ** 20 print(f"{msg:>3} time:{current_time:>10.1f} rss:{rss:>10.1f}") if __name__ == "__main__": print_mem(0) rows = 500 df = pd.DataFrame({"a": [np.arange(10) for i in range(rows)]}) print_mem(1) pyarrow_dump("example.parquet", df) print_mem(2) del df print_mem(3) time.sleep(3) print_mem(4) df = pyarrow_load("example.parquet") print_mem(5) time.sleep(3) print_mem(6) del df print_mem(7) time.sleep(3) print_mem(8) {code} Run with memory-profiler: {code:bash} mprof run --multiprocess python test_pyarrow.py {code} Output: {code:java} mprof: Sampling memory every 0.1s running new process 0 time: 0.0 rss: 135.4 1 time: 4.9 rss:1252.2 2 time: 7.1 rss:1265.0 3 time: 7.5 rss: 760.2 4 time: 10.7 rss: 758.9 5 time: 19.6 rss: 16745.4 6 time: 22.6 rss: 16335.4 7 time: 22.9 rss: 15833.0 8 time: 25.9 rss: 955.0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #62: [Format][C][Java] Add method to get parameter schema
lidavidm commented on code in PR #62: URL: https://github.com/apache/arrow-adbc/pull/62#discussion_r944581552 ## adbc.h: ## @@ -746,6 +746,22 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement, struct ArrowArrayStream* values, struct AdbcError* error); +/// \brief Get the schema for bound parameters. +/// +/// This should be called after AdbcStatementPrepare. This retrieves +/// an Arrow schema describing the number, names, and types of the +/// parameters in a parameterized statement. Not all drivers will +/// support this. If the name of a parameter cannot be determined, +/// the name of the corresponding field in the schema will be an empty +/// string. Similarly, if the type cannot be statically determined, +/// the type of the corresponding field will be NA (NullType). Review Comment: Good idea - updated the docstrings -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17398) [R] Add support for %Z to strptime
Rok Mihevc created ARROW-17398: -- Summary: [R] Add support for %Z to strptime Key: ARROW-17398 URL: https://issues.apache.org/jira/browse/ARROW-17398 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Rok Mihevc While lubridate does not support %Z flag for strptime Arrow could. Changes to C++ kernels might be required for support on all platforms, but that shouldn't block implementation as kStrptimeSupportsZone flag can be used, [see proposal|https://github.com/apache/arrow/pull/13854#issuecomment-1212694663]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-adbc] lidavidm opened a new issue, #64: [Format] Formalize thread safety guarantees
lidavidm opened a new issue, #64: URL: https://github.com/apache/arrow-adbc/issues/64 Things to consider - What do underlying APIs provide (libpq, sqlite, JDBC, ODBC, Flight SQL) - What do wrapper APIs expect (JDBC, ODBC, dbapi, Go's database library) Example: libpq disallows concurrent queries through a single PGconn, so multiple AdbcStatements can't be used if they share a connection (and the semantics of that get murky anyways) - but what should the behavior be? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #62: [Format][C][Java] Add method to get parameter schema
zeroshade commented on code in PR #62: URL: https://github.com/apache/arrow-adbc/pull/62#discussion_r944512305 ## adbc.h: ## @@ -746,6 +746,22 @@ AdbcStatusCode AdbcStatementBindStream(struct AdbcStatement* statement, struct ArrowArrayStream* values, struct AdbcError* error); +/// \brief Get the schema for bound parameters. +/// +/// This should be called after AdbcStatementPrepare. This retrieves +/// an Arrow schema describing the number, names, and types of the +/// parameters in a parameterized statement. Not all drivers will +/// support this. If the name of a parameter cannot be determined, +/// the name of the corresponding field in the schema will be an empty +/// string. Similarly, if the type cannot be statically determined, +/// the type of the corresponding field will be NA (NullType). Review Comment: should we also explicitly state/define that the order of the columns in the schema should match the ordinal position of the parameters and if a named parameter is used multiple times in the query, it should only appear once in the schema? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on pull request #62: [Format][C][Java] Add method to get parameter schema
lidavidm commented on PR #62: URL: https://github.com/apache/arrow-adbc/pull/62#issuecomment-1213030715 @zeroshade does this seem reasonable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17397) [R] Does R API for Apache Arrow has a tableFromIPC function ?
Roy Assis created ARROW-17397: - Summary: [R] Does R API for Apache Arrow has a tableFromIPC function ? Key: ARROW-17397 URL: https://issues.apache.org/jira/browse/ARROW-17397 Project: Apache Arrow Issue Type: Improvement Reporter: Roy Assis I'm building an API using python and flask. I want to return a dataframe from the API, i'm serializing the dataframe like so and sending it in the response: {code:python} batch = pa.record_batch(df) sink = pa.BufferOutputStream() with pa.ipc.new_stream(sink, batch.schema) as writer: writer.write_batch(batch) pybytes = sink.getvalue().to_pybytes() {code} Is it possible to read it with R ? If so can you provide a code snippet. Best, Roy -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17396) [C++][Dataset] Allow creating FileSystemDataset with FileInfoGenerator as a source
Pavel Solodovnikov created ARROW-17396: -- Summary: [C++][Dataset] Allow creating FileSystemDataset with FileInfoGenerator as a source Key: ARROW-17396 URL: https://issues.apache.org/jira/browse/ARROW-17396 Project: Apache Arrow Issue Type: Sub-task Reporter: Pavel Solodovnikov -- This message was sent by Atlassian Jira (v8.20.10#820010)