It depends on the structure. Arrow still stores structs as column oriented. So you still gain the benefits of Arrow and it’s column-orientation while being able to properly reference lists and structs correctly.
That said, you do still need to keep in mind how your consumers will want to utilize the data when you start considering what your GraphQL schema will be to ensure optimal usage of the data to minimize consumers having to copy or re-orient the data from the Arrow Arrays. --Matt From: Rollo Konig-Brock <[email protected]> Sent: Wednesday, May 11, 2022 2:08 PM To: [email protected] Subject: Re: GraphQL over Arrow (+ Flight)? Is it actually a good idea to send complex nested data structures over arrow? Don’t you loose a lot of it’s benefits? Just asking why? Genuinely curious as I have some data that is ostensibly time series (a time series of trade positions, each Is it actually a good idea to send complex nested data structures over arrow? Don’t you loose a lot of it’s benefits? Just asking why? Genuinely curious as I have some data that is ostensibly time series (a time series of trade positions, each snapshot is a hashmap differing in cardinality so it can’t feasibly be split into a stream per key). On 11 May 2022, at 16:27, Gavin Ray <[email protected]<mailto:[email protected]>> wrote: > For the Request, we have a Protobuf message that consists of two strings: the > GraphQL query and an optional JSON string for variable definitions. We > marshal the protobuf message to bytes which are used as the “ticket” for the > `DoGet` request through Flight. Ah okay, so for the request you would just follow the standard "/graphql" query object, with/without "operationName" > Because Arrow already contains complex types like nested structs / lists / > etc. it’s not too difficult to construct an arrow Schema from the expected > GraphQL response schema and just return a stream of record batches. Nice to know this is doable, it seemed like it might be overly complicated to write the transform > Since pretty much every existing GraphQL engine outputs JSON right now, we’ve > essentially built our own execution engine at this point by utilizing the > planner from > https://pkg.go.dev/github.com/jensneuse/graphql-go-tools<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/jensneuse/graphql-go-tools__;!!PBKjc0U4!O63I0tWLMLI_3uLkflFqJYUD8USUQhAHlnfOO_xkY2dePWcrJ1aRpyDGKRASlsl6qggfVYiJ9W-dCA$> > and a custom built execution layer to execute the generated plan. This library is really neat, thanks for posting. Seems to have wrapper datafetchers too for REST/GQL/static datasources, which is nice. I'd be using "graphql-java", where a resolver/datafetcher can return arbitrary types so I don't think the JSON bit would be a hangup at least -- could directly return record batches from query execution. On Wed, May 11, 2022 at 10:55 AM Matthew Topol <[email protected]<mailto:[email protected]>> wrote: So I’m actually doing this currently in production for a service, as I spoke about in a talk at the Subsurface conference. For the Request, we have a Protobuf message that consists of two strings: the GraphQL query and an optional JSON string for variable definitions. We marshal the protobuf message to bytes which are used as the “ticket” for the `DoGet` request through Flight. Since pretty much every existing GraphQL engine outputs JSON right now, we’ve essentially built our own execution engine at this point by utilizing the planner from https://pkg.go.dev/github.com/jensneuse/graphql-go-tools<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/jensneuse/graphql-go-tools__;!!PBKjc0U4!O63I0tWLMLI_3uLkflFqJYUD8USUQhAHlnfOO_xkY2dePWcrJ1aRpyDGKRASlsl6qggfVYiJ9W-dCA$> and a custom built execution layer to execute the generated plan. Because Arrow already contains complex types like nested structs / lists / etc. it’s not too difficult to construct an arrow Schema from the expected GraphQL response schema and just return a stream of record batches. --Matt From: Gavin Ray <[email protected]<mailto:[email protected]>> Sent: Wednesday, May 11, 2022 10:30 AM To: [email protected]<mailto:[email protected]> Subject: GraphQL over Arrow (+ Flight)? If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly using these two together might look GraphQL is transport-agnostic, so you can theoretically use it over anything, a good case study being Dan Luu's article here: If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly using these two together might look GraphQL is transport-agnostic, so you can theoretically use it over anything, a good case study being Dan Luu's article here: https://danluu.com/simple-architectures/<https://urldefense.com/v3/__https:/danluu.com/simple-architectures/__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScm9mYD2FA$> > "Some areas where we’re happy with our choices even though they may not > sound like the simplest feasible solution are with our API, where we use > GraphQL, with our transport protocols, where we had a custom protocol for a > while, and our host management, where we use Kubernetes. For our transport > protocols, we used to use a custom protocol that runs on top of UDP, with an > SMS and USSD fallback, for the performance reasons described in this talk. > With the rollout of HTTP/3, we’ve been able to replace our custom protocol > with HTTP/3 and we generally only need USSD for events like the recent > internet shutdowns in Mali)." I've seen also GraphQL done over Protobuf/gRPC, TCP/MsgPack, and a custom binary format: - https://github.com/google/rejoiner/blob/b1cb09e9bbf7ac68bfd9c93f23a73b691e6ead72/examples-gradle/src/main/java/com/google/api/graphql/examples/streaming/graphqlserver/GraphQlGrpcServer.java#L44<https://urldefense.com/v3/__https:/github.com/google/rejoiner/blob/b1cb09e9bbf7ac68bfd9c93f23a73b691e6ead72/examples-gradle/src/main/java/com/google/api/graphql/examples/streaming/graphqlserver/GraphQlGrpcServer.java*L44__;Iw!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmpUCkRiE$> - https://github.com/OlegIlyenko/sangria-tcp-msgpack-example<https://urldefense.com/v3/__https:/github.com/OlegIlyenko/sangria-tcp-msgpack-example__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmo3VMXhI$> - https://github.com/esseswann/graphql-binary<https://urldefense.com/v3/__https:/github.com/esseswann/graphql-binary__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmErmXk2s$> If someone were interested in using Arrow as the encoding layer, how would this work in practice? Arrow messages need to have a well-defined schema, and GraphQL queries return dynamic, nested data, so I'm having a hard time understanding how you'd go about representing/encoding that in an Arrow message. Thank you =)
