Hey Steve - You could look at Flight SQL, which uses this pattern: https://arrow.apache.org/docs/format/FlightSql.html#sequence-diagrams That way, a system can execute the query on multiple machines, and let the client fetch data from all those machines, instead of having to proxy data through a central endpoint. Dremio implements (and originally contributed) Flight SQL in their product.
The BigQuery Storage API, while not Flight based, is also gRPC/Arrow-based and does something similar (creating a ReadSession returns multiple ReadStreams, which the client then fetches individually): https://github.com/googleapis/googleapis/blob/master/google/cloud/bigquery/storage/v1/stream.proto There is nothing stopping you from sending a ticket directly if that fits the needs of your own service, although it wouldn't fit the recommended pattern. There is also a proposal to let data be directly inlined into FlightInfo to get the 'best of both worlds': https://github.com/apache/arrow/pull/12571 -David On Sat, Oct 1, 2022, at 18:28, Steve Kim wrote: > I am exploring how to implement an API with Arrow Flight, and I am having > difficulty understanding why GetFlightInfo is a separate step from DoGet. I > imagine a Ticket to be a serialized application-specific request that is > understood by the server(s), and I don't understand why the client should > have to send a pre-flight command in a FlightDescriptor to obtain tickets, > instead of directly sending tickets to the server(s). Can someone share a > real-world, non-contrived example of a Flight service that relies on this > feature of the protocol to achieve performance, scalability, etc. goals? > > Thanks, > Steve
