It depends on the structure. Arrow still stores structs as column oriented. So 
you still gain the benefits of Arrow and it’s column-orientation while being 
able to properly reference lists and structs correctly.

That said, you do still need to keep in mind how your consumers will want to 
utilize the data when you start considering what your GraphQL schema will be to 
ensure optimal usage of the data to minimize consumers having to copy or 
re-orient the data from the Arrow Arrays.

--Matt

From: Rollo Konig-Brock <[email protected]>
Sent: Wednesday, May 11, 2022 2:08 PM
To: [email protected]
Subject: Re: GraphQL over Arrow (+ Flight)?

Is it actually a good idea to send complex nested data structures over arrow? 
Don’t you loose a lot of it’s benefits? Just asking why? Genuinely curious as I 
have some data that is ostensibly time series (a time series of trade 
positions, each

Is it actually a good idea to send complex nested data structures over arrow?  
Don’t you loose a lot of it’s benefits?

Just asking why? Genuinely curious as I have some data that is ostensibly time 
series (a time series of trade positions, each snapshot is a hashmap differing 
in cardinality so it can’t feasibly be split into a stream per key).


On 11 May 2022, at 16:27, Gavin Ray 
<[email protected]<mailto:[email protected]>> wrote:

> For the Request, we have a Protobuf message that consists of two strings: the 
> GraphQL query and an optional JSON string for variable definitions. We 
> marshal the protobuf message to bytes which are used as the “ticket” for the 
> `DoGet` request through Flight.

Ah okay, so for the request you would just follow the standard "/graphql" query 
object, with/without "operationName"

>  Because Arrow already contains complex types like nested structs / lists / 
> etc. it’s not too difficult to construct an arrow Schema from the expected 
> GraphQL response schema and just return a stream of record batches.

Nice to know this is doable, it seemed like it might be overly complicated to 
write the transform

> Since pretty much every existing GraphQL engine outputs JSON right now, we’ve 
> essentially built our own execution engine at this point by utilizing the 
> planner from 
> https://pkg.go.dev/github.com/jensneuse/graphql-go-tools<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/jensneuse/graphql-go-tools__;!!PBKjc0U4!O63I0tWLMLI_3uLkflFqJYUD8USUQhAHlnfOO_xkY2dePWcrJ1aRpyDGKRASlsl6qggfVYiJ9W-dCA$>
>  and a custom built execution layer to execute the generated plan.

This library is really neat, thanks for posting.
Seems to have wrapper datafetchers too for REST/GQL/static datasources, which 
is nice.

I'd be using "graphql-java", where a resolver/datafetcher can return arbitrary 
types so
I don't think the JSON bit would be a hangup at least -- could directly return 
record batches from query execution.


On Wed, May 11, 2022 at 10:55 AM Matthew Topol 
<[email protected]<mailto:[email protected]>> wrote:
So I’m actually doing this currently in production for a service, as I spoke 
about in a talk at the Subsurface conference.

For the Request, we have a Protobuf message that consists of two strings: the 
GraphQL query and an optional JSON string for variable definitions. We marshal 
the protobuf message to bytes which are used as the “ticket” for the `DoGet` 
request through Flight.

Since pretty much every existing GraphQL engine outputs JSON right now, we’ve 
essentially built our own execution engine at this point by utilizing the 
planner from 
https://pkg.go.dev/github.com/jensneuse/graphql-go-tools<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/jensneuse/graphql-go-tools__;!!PBKjc0U4!O63I0tWLMLI_3uLkflFqJYUD8USUQhAHlnfOO_xkY2dePWcrJ1aRpyDGKRASlsl6qggfVYiJ9W-dCA$>
 and a custom built execution layer to execute the generated plan. Because 
Arrow already contains complex types like nested structs / lists / etc. it’s 
not too difficult to construct an arrow Schema from the expected GraphQL 
response schema and just return a stream of record batches.

--Matt

From: Gavin Ray <[email protected]<mailto:[email protected]>>
Sent: Wednesday, May 11, 2022 10:30 AM
To: [email protected]<mailto:[email protected]>
Subject: GraphQL over Arrow (+ Flight)?

If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly 
using these two together might look GraphQL is transport-agnostic, so you can 
theoretically use it over anything, a good case study being Dan Luu's article 
here:
If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly using
these two together might look

GraphQL is transport-agnostic, so you can theoretically use it over anything, a
good case study being Dan Luu's article here:

https://danluu.com/simple-architectures/<https://urldefense.com/v3/__https:/danluu.com/simple-architectures/__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScm9mYD2FA$>

  > "Some areas where we’re happy with our choices even though they may not
  > sound like the simplest feasible solution are with our API, where we use
  > GraphQL, with our transport protocols, where we had a custom protocol for a
  > while, and our host management, where we use Kubernetes. For our transport
  > protocols, we used to use a custom protocol that runs on top of UDP, with an
  > SMS and USSD fallback, for the performance reasons described in this talk.
  > With the rollout of HTTP/3, we’ve been able to replace our custom protocol
  > with HTTP/3 and we generally only need USSD for events like the recent
  > internet shutdowns in Mali)."

I've seen also GraphQL done over Protobuf/gRPC, TCP/MsgPack, and a custom binary
format:

- 
https://github.com/google/rejoiner/blob/b1cb09e9bbf7ac68bfd9c93f23a73b691e6ead72/examples-gradle/src/main/java/com/google/api/graphql/examples/streaming/graphqlserver/GraphQlGrpcServer.java#L44<https://urldefense.com/v3/__https:/github.com/google/rejoiner/blob/b1cb09e9bbf7ac68bfd9c93f23a73b691e6ead72/examples-gradle/src/main/java/com/google/api/graphql/examples/streaming/graphqlserver/GraphQlGrpcServer.java*L44__;Iw!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmpUCkRiE$>
- 
https://github.com/OlegIlyenko/sangria-tcp-msgpack-example<https://urldefense.com/v3/__https:/github.com/OlegIlyenko/sangria-tcp-msgpack-example__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmo3VMXhI$>
- 
https://github.com/esseswann/graphql-binary<https://urldefense.com/v3/__https:/github.com/esseswann/graphql-binary__;!!PBKjc0U4!LicJnBEMitViK3f8WuQjuSyTDDOogBbNcayOeiKGzh6ji7bczg1ZwYxx1U6LyevlM23lMw5OeScmErmXk2s$>

If someone were interested in using Arrow as the encoding layer, how would this
work in practice?

Arrow messages need to have a well-defined schema, and GraphQL
queries return dynamic, nested data, so I'm having a hard time understanding how
you'd go about representing/encoding that in an Arrow message.

Thank you =)

Reply via email to