[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-17 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1218147989

   Ok, I think we'll want to tweak the partitioning interface further after 
this refactor, see #68.
   
   Also I suppose this supersedes the 'affected row count' issue in #55.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-16 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1217131457

   Ok, while digging I realized we also need to account for partitioning 
(=Flight/Flight SQL's ability to return multiple endpoints in a GetFlightInfo). 
So what I'm tentatively about to refactor towards is this:
   
   ```c
   /// \brief Execute a statement and get the results.
   ///
   /// This invalidates any prior result sets.
   ///
   /// \param[in] statement The statement to execute.
   /// \param[in] out_type The expected result type:
   ///   - ADBC_OUTPUT_TYPE_NONE if the query should not generate a
   /// result set;
   ///   - ADBC_OUTPUT_TYPE_ARROW for an ArrowArrayStream;
   ///   - ADBC_OUTPUT_TYPE_PARTITIONS for a count of partitions (see \ref
   /// adbc-statement-partition below).
   ///   The result set will be in out.
   /// \param[out] out The results. Must be NULL for output type NONE, a
   ///   pointer to an ArrowArrayStream for ARROW_ARRAY_STREAM, or a
   ///   pointer to a size_t for PARTITIONS.
   /// \param[out] rows_affected The number of rows affected if known,
   ///   else -1. Pass NULL if the client does not want this information.
   /// \param[out] error An optional location to return an error
   ///   message if necessary.
   ADBC_EXPORT
   AdbcStatusCode AdbcStatementExecute(struct AdbcStatement* statement, int 
output_type,
   void* out, int64_t* rows_affected,
   struct AdbcError* error);
   
   /// \brief No results are expected from AdbcStatementExecute.  Pass
   ///   NULL to out.
   #define ADBC_OUTPUT_TYPE_NONE 0
   /// \brief Arrow data is expected from AdbcStatementExecute.  Pass
   ///   ArrowArrayStream* to out.
   #define ADBC_OUTPUT_TYPE_ARROW 1
   /// \brief Partitions are expected from AdbcStatementExecute.  Pass
   ///   size_t* to out to get the number of partitions, and use
   ///   AdbcStatementGetPartitionDesc to get a partition.
   ///
   /// Drivers are not required to support partitioning.  In that case,
   /// AdbcStatementExecute will return ADBC_STATUS_NOT_IMPLEMENTED.
   #define ADBC_OUTPUT_TYPE_PARTITIONS 2
   ```
   
   _If_ we decide to add native support for "affected row IDs", that could also 
go here. Otherwise I don't expect that we'd need more enum variants here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-15 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1215630552

   Ok, so how does this sound (here, I'm ignoring #59 provide a "just query" 
method):
   
   Remove `AdbcStatementGetStream`
   Change `AdbcStatementExecute` to `AdbcStatementExecute(struct 
AdbcStatement*, struct ArrowArrayStream*, size_t*, struct AdbcError*)` where 
the ArrowArrayStream is for the result set and the size_t is for the rows 
affected (if known, else 0)
   Change the various `AdbcConnectionGetInfo` methods to take `struct 
ArrowArrayStream*` as their output instead of `struct AdbcStatement*` (but 
clarify they still may count as a 'statement' for the purposes of concurrency - 
that is, they may acquire the same locks, etc. internally)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-14 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1214424587

   Right, and on the other hand, databases like SQLite have no reliable way to 
get the info. But APIs like JDBC, Python DBAPI, and Go's database API expose 
standard ways to get last inserted ID(s). I wasn't originally concerned with it 
since it feels like the 'wrong' use case to worry about so I'm fine ignoring it.
   
   I think what's being discussed here is basically to have the same things as 
DBI, and the `Query` you propose seems sufficient? I suppose a header-only 
library cannot provide an (overridable) convenience function, so `Query` would 
have to be a full-fledged API that can be stubbed by the driver manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-14 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1214419966

   > Just to be sure we're on the same page:
   > 
   > * a "query" is a single SQL string that can return a result set but 
doesn't have to
   > 
   > * a "statement" is the result of preparing an SQL string with 
placeholders, so that parameters can be added
   > 
   > * we're debating whether the caller should declare up front if a 
result is expected
   > 
   > 
   > In R DBI we're using the term "query" to indicate something that returns a 
result set (with or without parameters), and a "statement" doesn't return a 
result set but we can query the number of rows affected.
   
   That sounds reasonable. (`adbc.h` should define some terminology up front!) 
Though I'm still leaning towards having a separate `Statement` struct even for 
things that aren't necessarily prepared statements, just because that makes it 
easier to add options (as suggested) without potentially breaking ABI. 
   
   > The more information we can share with the driver up front, the better. Do 
we really need two methods, though -- would a single `Query(struct 
AdbcConnection*, const char*, struct ArrowArrayStream*, struct AdbcError*)` 
with an optional third argument (can be `NULL`) also work? Or perhaps an option 
argument that indicates if we expect a result set, and return metadata (rows 
affected etc.) via the arrow stream?
   
   A single method sounds reasonable. The `Statement` struct can keep the 
fine-grained APIs. I'd possibly argue that if you want more metadata, you 
should resort to the lower level APIs. Maybe we can just add the row count 
parameter as well, for convenience?
   
   ```c
   Query(struct AdbcConnection*, const char*, struct ArrowArrayStream*, 
size_t*, struct AdbcError*)
   ```
   
   If the query returns a result set, the ArrowArrayStream contains the result 
set and the size_t is the number of rows (if known, else 0). If not, the 
ArrowArrayStream is not set (or: contains last inserted IDs, if supported and 
configured) and the size_t contains rows affected.
   
   > 
   > What should happen to queries that indicate that no result set is expected 
but where a result set is available? Is a warning useful or annoying?
   
   I think we can just ignore the result set in that case. This also makes it 
easy to support "Retrieve the last inserted id for inserts into an 
auto-increment table" that @zeroshade mentioned in #55: the caller can set an 
option to return the inserted IDs, and they'll come back via the result set.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-12 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1213320396

   Another reason to differentiate between queries with/without result sets: in 
a Postgres driver, that means we know when we can attempt to use `COPY (...) TO 
STDOUT (FORMAT binary)` (akin to DuckDB's integration with Postgres) to get 
bulk binary data instead of parsing data one row at a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-10 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1210994922

   CC @pitrou, @hannes, @krlmlr if you have opinions here? 
   
   @lwhite1 had the same feedback about executeQuery/execute in Java last 
month. So for consistency a query method returning a result set makes sense at 
the very least.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-10 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1210989539

   Also, possibly the driver manager could define execute-with-result-set and 
execute-with-rows-affected in terms of the generic execute + generic getters to 
retrieve the affected rows/last row ID. So a basic driver (e.g. SQLite, which 
doesn't differentiate) would be straightforward to implement still but clients 
can be mostly none the wiser. It does increase the actual API surface to have 
all those permutations though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on issue #61: [Format] Simplify Execute and Query interface

2022-08-10 Thread GitBox


lidavidm commented on issue #61:
URL: https://github.com/apache/arrow-adbc/issues/61#issuecomment-1210987487

   I think it may still have sense to have a generic Execute to ease 
compatibility with APIs that do not differentiate between the types of queries 
(and note JDBC has all three!), but having an execute-with-rows-affected and 
execute-with-result-set is reasonable. 
   
   Concurrency is a separate discussion; I'm torn on whether we should push the 
complexity into the driver (and declare up front that, for example, everything 
must be thread-safe), or declare that everything is _not_ thread-safe and leave 
clients to deal with it (so Go would have to lock the statement to use it), or 
(like DBAPI in Python) provide a way for drivers to indicate what they support 
(this is probably the worst of both worlds though). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org