Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-18 Thread Ian Cook
Thanks to everyone who has contributed to this work so far. We now have
simple HTTP client/server examples in 10 languages, all tested and verified
to interoperate:
https://github.com/apache/arrow-experiments/tree/main/http/get_simple

There is an umbrella issue tracking the next planned tasks:
https://github.com/apache/arrow/issues/40465

Ian

On Tue, Mar 5, 2024 at 11:01 PM Ian Cook  wrote:

> Update on recent progress in this Arrow-over-HTTP project:
>
> I cleaned up the minimal examples of HTTP clients and servers and
> moved them into a directory in the Arrow Experiments repo:
> https://github.com/apache/arrow-experiments/tree/main/http
>
> So far there are client examples in six languages and server examples
> in two languages (Python and Go). They all have READMEs describing how
> to use them.
>
> I have an open PR that adds a third server example in Java. Reviews
> appreciated:
> https://github.com/apache/arrow-experiments/pull/4
>
> I would like to see minimal client and server examples in a few more
> languages (especially Rust) before we move on to developing richer
> types of examples. Is anyone interested in contributing additional
> minimal examples?
>
> Thanks,
> Ian
>
> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
> >
> > I just remembered that there is an unused "Arrow Experiments" repo [1]
> > which Wes created a few years ago [2]. That seems like a more
> > appropriate place to open PRs like this one. If there are no
> > objections, I will start using that repo for these Arrow-over-HTTP
> > PRs.
> >
> > [1] https://github.com/apache/arrow-experiments
> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> > >
> > > Antoine,
> > >
> > > Thank you for taking a look. I agree—these are basic examples intended
> > > to prove the concept and answer fundamental questions. Next I intend
> > > to expand the set of examples to cover more complex cases.
> > >
> > > > This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > >
> > > I am interested to hear more perspectives on this. My perspective is
> > > that we should recommend using HTTP conventions to keep clean
> > > separation between the Arrow-formatted binary data payloads and the
> > > various application-specific fields. This can be achieved by encoding
> > > application-specific fields in URI paths, query parameters, headers,
> > > or separate parts of multipart/form-data messages.
> > >
> > > Ian
> > >
> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
> wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > While this looks like a nice start, I would expect more precise
> > > > recommendations for writing non-trivial services. Especially, one
> > > > question is how to send both an application-specific POST request
> and an
> > > > Arrow stream, or an application-specific GET response and an Arrow
> > > > stream. This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > > This is a continuation of the discussion entitled "[DISCUSS]
> Protocol for
> > > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > > >
> > > > > To inform this discussion, I created some basic Arrow-over-HTTP
> client and
> > > > > server examples here:
> > > > > https://github.com/apache/arrow/pull/39081
> > > > >
> > > > > My intention is to expand and improve this set of examples (with
> your help)
> > > > > until they reflect a set of conventions that we are comfortable
> documenting
> > > > > as recommendations.
> > > > >
> > > > > Please take a look and add comments / suggestions in the PR.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > > >  wrote:
> > > > >
> > > > >> I also think a set of best practices for Arrow over HTTP would be
> a
> > > > >> valuable resource for the community...even if it never becomes a
> > > > >> specification of its own, it will be beneficial for API
> developers and
> > > > >> consumers of those APIs to have a place to look to understand how
> > > > >> Arrow can help improve throughput/latency/maybe other things.
> Possibly
> > > > >> something like httpbin.org but for requests/responses that use
> Arrow
> > > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > > >>
> > > > >> It has mostly been covered already, but in the (ubiquitous)
> situation
> > > > >> where a response contains some schema/table and some
> non-schema/table
> > > > >> information there is some tension between throughput (best served
> by a
> > > > >> JSON response plus one or more IPC stream responses) and latency
> (best
> > > > >> served by a single HTTP response? JSON? IPC with
> 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-11 Thread Bryce Mecum
I'd be happy to contribute C# and Ruby examples. I'll work on those this week.

On Tue, Mar 5, 2024 at 7:03 PM Ian Cook  wrote:
>
> Update on recent progress in this Arrow-over-HTTP project:
>
> I cleaned up the minimal examples of HTTP clients and servers and
> moved them into a directory in the Arrow Experiments repo:
> https://github.com/apache/arrow-experiments/tree/main/http
>
> So far there are client examples in six languages and server examples
> in two languages (Python and Go). They all have READMEs describing how
> to use them.
>
> I have an open PR that adds a third server example in Java. Reviews 
> appreciated:
> https://github.com/apache/arrow-experiments/pull/4
>
> I would like to see minimal client and server examples in a few more
> languages (especially Rust) before we move on to developing richer
> types of examples. Is anyone interested in contributing additional
> minimal examples?
>
> Thanks,
> Ian
>
> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
> >
> > I just remembered that there is an unused "Arrow Experiments" repo [1]
> > which Wes created a few years ago [2]. That seems like a more
> > appropriate place to open PRs like this one. If there are no
> > objections, I will start using that repo for these Arrow-over-HTTP
> > PRs.
> >
> > [1] https://github.com/apache/arrow-experiments
> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> > >
> > > Antoine,
> > >
> > > Thank you for taking a look. I agree—these are basic examples intended
> > > to prove the concept and answer fundamental questions. Next I intend
> > > to expand the set of examples to cover more complex cases.
> > >
> > > > This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > >
> > > I am interested to hear more perspectives on this. My perspective is
> > > that we should recommend using HTTP conventions to keep clean
> > > separation between the Arrow-formatted binary data payloads and the
> > > various application-specific fields. This can be achieved by encoding
> > > application-specific fields in URI paths, query parameters, headers,
> > > or separate parts of multipart/form-data messages.
> > >
> > > Ian
> > >
> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou  wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > While this looks like a nice start, I would expect more precise
> > > > recommendations for writing non-trivial services. Especially, one
> > > > question is how to send both an application-specific POST request and an
> > > > Arrow stream, or an application-specific GET response and an Arrow
> > > > stream. This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > > This is a continuation of the discussion entitled "[DISCUSS] Protocol 
> > > > > for
> > > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > > >
> > > > > To inform this discussion, I created some basic Arrow-over-HTTP 
> > > > > client and
> > > > > server examples here:
> > > > > https://github.com/apache/arrow/pull/39081
> > > > >
> > > > > My intention is to expand and improve this set of examples (with your 
> > > > > help)
> > > > > until they reflect a set of conventions that we are comfortable 
> > > > > documenting
> > > > > as recommendations.
> > > > >
> > > > > Please take a look and add comments / suggestions in the PR.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > > >  wrote:
> > > > >
> > > > >> I also think a set of best practices for Arrow over HTTP would be a
> > > > >> valuable resource for the community...even if it never becomes a
> > > > >> specification of its own, it will be beneficial for API developers 
> > > > >> and
> > > > >> consumers of those APIs to have a place to look to understand how
> > > > >> Arrow can help improve throughput/latency/maybe other things. 
> > > > >> Possibly
> > > > >> something like httpbin.org but for requests/responses that use Arrow
> > > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > > >>
> > > > >> It has mostly been covered already, but in the (ubiquitous) situation
> > > > >> where a response contains some schema/table and some non-schema/table
> > > > >> information there is some tension between throughput (best served by 
> > > > >> a
> > > > >> JSON response plus one or more IPC stream responses) and latency 
> > > > >> (best
> > > > >> served by a single HTTP response? JSON? IPC with metadata/header?). 
> > > > >> In
> > > > >> addition to Antoine's list, I would add:
> > > > >>
> > > > >> - How to serve the same table in multiple requests (e.g., to saturate
> > > > >> a network 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-11 Thread Andrew Lamb
Update -- turns out there was already a Rust client/server -- linked to the
ticket now

On Mon, Mar 11, 2024 at 3:07 PM Andrew Lamb  wrote:

> I sadly don't have time to help with this directly, however, I did file a
> ticket with the request to help with a Rust prototype [1]. Hopefully we'll
> get a taker
>
> [1] https://github.com/apache/arrow-rs/issues/5496
>
> On Tue, Mar 5, 2024 at 11:03 PM Ian Cook  wrote:
>
>> Update on recent progress in this Arrow-over-HTTP project:
>>
>> I cleaned up the minimal examples of HTTP clients and servers and
>> moved them into a directory in the Arrow Experiments repo:
>> https://github.com/apache/arrow-experiments/tree/main/http
>>
>> So far there are client examples in six languages and server examples
>> in two languages (Python and Go). They all have READMEs describing how
>> to use them.
>>
>> I have an open PR that adds a third server example in Java. Reviews
>> appreciated:
>> https://github.com/apache/arrow-experiments/pull/4
>>
>> I would like to see minimal client and server examples in a few more
>> languages (especially Rust) before we move on to developing richer
>> types of examples. Is anyone interested in contributing additional
>> minimal examples?
>>
>> Thanks,
>> Ian
>>
>> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
>> >
>> > I just remembered that there is an unused "Arrow Experiments" repo [1]
>> > which Wes created a few years ago [2]. That seems like a more
>> > appropriate place to open PRs like this one. If there are no
>> > objections, I will start using that repo for these Arrow-over-HTTP
>> > PRs.
>> >
>> > [1] https://github.com/apache/arrow-experiments
>> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
>> >
>> > Ian
>> >
>> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
>> > >
>> > > Antoine,
>> > >
>> > > Thank you for taking a look. I agree—these are basic examples intended
>> > > to prove the concept and answer fundamental questions. Next I intend
>> > > to expand the set of examples to cover more complex cases.
>> > >
>> > > > This might necessitate some kind of framing layer, or a
>> > > > standardized delimiter.
>> > >
>> > > I am interested to hear more perspectives on this. My perspective is
>> > > that we should recommend using HTTP conventions to keep clean
>> > > separation between the Arrow-formatted binary data payloads and the
>> > > various application-specific fields. This can be achieved by encoding
>> > > application-specific fields in URI paths, query parameters, headers,
>> > > or separate parts of multipart/form-data messages.
>> > >
>> > > Ian
>> > >
>> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
>> wrote:
>> > > >
>> > > >
>> > > > Hi,
>> > > >
>> > > > While this looks like a nice start, I would expect more precise
>> > > > recommendations for writing non-trivial services. Especially, one
>> > > > question is how to send both an application-specific POST request
>> and an
>> > > > Arrow stream, or an application-specific GET response and an Arrow
>> > > > stream. This might necessitate some kind of framing layer, or a
>> > > > standardized delimiter.
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > > >
>> > > >
>> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
>> > > > > This is a continuation of the discussion entitled "[DISCUSS]
>> Protocol for
>> > > > > exchanging Arrow data over REST APIs". See the previous messages
>> at
>> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
>> > > > >
>> > > > > To inform this discussion, I created some basic Arrow-over-HTTP
>> client and
>> > > > > server examples here:
>> > > > > https://github.com/apache/arrow/pull/39081
>> > > > >
>> > > > > My intention is to expand and improve this set of examples (with
>> your help)
>> > > > > until they reflect a set of conventions that we are comfortable
>> documenting
>> > > > > as recommendations.
>> > > > >
>> > > > > Please take a look and add comments / suggestions in the PR.
>> > > > >
>> > > > > Thanks,
>> > > > > Ian
>> > > > >
>> > > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
>> > > > >  wrote:
>> > > > >
>> > > > >> I also think a set of best practices for Arrow over HTTP would
>> be a
>> > > > >> valuable resource for the community...even if it never becomes a
>> > > > >> specification of its own, it will be beneficial for API
>> developers and
>> > > > >> consumers of those APIs to have a place to look to understand how
>> > > > >> Arrow can help improve throughput/latency/maybe other things.
>> Possibly
>> > > > >> something like httpbin.org but for requests/responses that use
>> Arrow
>> > > > >> would be helpful as well. Thank you Ian for leading this effort!
>> > > > >>
>> > > > >> It has mostly been covered already, but in the (ubiquitous)
>> situation
>> > > > >> where a response contains some schema/table and some
>> non-schema/table
>> > > > >> information there is some tension between throughput (best
>> served by a
>> > > > 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-11 Thread Andrew Lamb
I sadly don't have time to help with this directly, however, I did file a
ticket with the request to help with a Rust prototype [1]. Hopefully we'll
get a taker

[1] https://github.com/apache/arrow-rs/issues/5496

On Tue, Mar 5, 2024 at 11:03 PM Ian Cook  wrote:

> Update on recent progress in this Arrow-over-HTTP project:
>
> I cleaned up the minimal examples of HTTP clients and servers and
> moved them into a directory in the Arrow Experiments repo:
> https://github.com/apache/arrow-experiments/tree/main/http
>
> So far there are client examples in six languages and server examples
> in two languages (Python and Go). They all have READMEs describing how
> to use them.
>
> I have an open PR that adds a third server example in Java. Reviews
> appreciated:
> https://github.com/apache/arrow-experiments/pull/4
>
> I would like to see minimal client and server examples in a few more
> languages (especially Rust) before we move on to developing richer
> types of examples. Is anyone interested in contributing additional
> minimal examples?
>
> Thanks,
> Ian
>
> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
> >
> > I just remembered that there is an unused "Arrow Experiments" repo [1]
> > which Wes created a few years ago [2]. That seems like a more
> > appropriate place to open PRs like this one. If there are no
> > objections, I will start using that repo for these Arrow-over-HTTP
> > PRs.
> >
> > [1] https://github.com/apache/arrow-experiments
> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> > >
> > > Antoine,
> > >
> > > Thank you for taking a look. I agree—these are basic examples intended
> > > to prove the concept and answer fundamental questions. Next I intend
> > > to expand the set of examples to cover more complex cases.
> > >
> > > > This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > >
> > > I am interested to hear more perspectives on this. My perspective is
> > > that we should recommend using HTTP conventions to keep clean
> > > separation between the Arrow-formatted binary data payloads and the
> > > various application-specific fields. This can be achieved by encoding
> > > application-specific fields in URI paths, query parameters, headers,
> > > or separate parts of multipart/form-data messages.
> > >
> > > Ian
> > >
> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
> wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > While this looks like a nice start, I would expect more precise
> > > > recommendations for writing non-trivial services. Especially, one
> > > > question is how to send both an application-specific POST request
> and an
> > > > Arrow stream, or an application-specific GET response and an Arrow
> > > > stream. This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > > This is a continuation of the discussion entitled "[DISCUSS]
> Protocol for
> > > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > > >
> > > > > To inform this discussion, I created some basic Arrow-over-HTTP
> client and
> > > > > server examples here:
> > > > > https://github.com/apache/arrow/pull/39081
> > > > >
> > > > > My intention is to expand and improve this set of examples (with
> your help)
> > > > > until they reflect a set of conventions that we are comfortable
> documenting
> > > > > as recommendations.
> > > > >
> > > > > Please take a look and add comments / suggestions in the PR.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > > >  wrote:
> > > > >
> > > > >> I also think a set of best practices for Arrow over HTTP would be
> a
> > > > >> valuable resource for the community...even if it never becomes a
> > > > >> specification of its own, it will be beneficial for API
> developers and
> > > > >> consumers of those APIs to have a place to look to understand how
> > > > >> Arrow can help improve throughput/latency/maybe other things.
> Possibly
> > > > >> something like httpbin.org but for requests/responses that use
> Arrow
> > > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > > >>
> > > > >> It has mostly been covered already, but in the (ubiquitous)
> situation
> > > > >> where a response contains some schema/table and some
> non-schema/table
> > > > >> information there is some tension between throughput (best served
> by a
> > > > >> JSON response plus one or more IPC stream responses) and latency
> (best
> > > > >> served by a single HTTP response? JSON? IPC with
> metadata/header?). In
> > > > >> addition to Antoine's list, I would add:
> > > > >>
> > > > >> - How to serve the same table in multiple 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-05 Thread Ian Cook
Update on recent progress in this Arrow-over-HTTP project:

I cleaned up the minimal examples of HTTP clients and servers and
moved them into a directory in the Arrow Experiments repo:
https://github.com/apache/arrow-experiments/tree/main/http

So far there are client examples in six languages and server examples
in two languages (Python and Go). They all have READMEs describing how
to use them.

I have an open PR that adds a third server example in Java. Reviews appreciated:
https://github.com/apache/arrow-experiments/pull/4

I would like to see minimal client and server examples in a few more
languages (especially Rust) before we move on to developing richer
types of examples. Is anyone interested in contributing additional
minimal examples?

Thanks,
Ian

On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
>
> I just remembered that there is an unused "Arrow Experiments" repo [1]
> which Wes created a few years ago [2]. That seems like a more
> appropriate place to open PRs like this one. If there are no
> objections, I will start using that repo for these Arrow-over-HTTP
> PRs.
>
> [1] https://github.com/apache/arrow-experiments
> [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
>
> Ian
>
> On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> >
> > Antoine,
> >
> > Thank you for taking a look. I agree—these are basic examples intended
> > to prove the concept and answer fundamental questions. Next I intend
> > to expand the set of examples to cover more complex cases.
> >
> > > This might necessitate some kind of framing layer, or a
> > > standardized delimiter.
> >
> > I am interested to hear more perspectives on this. My perspective is
> > that we should recommend using HTTP conventions to keep clean
> > separation between the Arrow-formatted binary data payloads and the
> > various application-specific fields. This can be achieved by encoding
> > application-specific fields in URI paths, query parameters, headers,
> > or separate parts of multipart/form-data messages.
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou  wrote:
> > >
> > >
> > > Hi,
> > >
> > > While this looks like a nice start, I would expect more precise
> > > recommendations for writing non-trivial services. Especially, one
> > > question is how to send both an application-specific POST request and an
> > > Arrow stream, or an application-specific GET response and an Arrow
> > > stream. This might necessitate some kind of framing layer, or a
> > > standardized delimiter.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > This is a continuation of the discussion entitled "[DISCUSS] Protocol 
> > > > for
> > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > >
> > > > To inform this discussion, I created some basic Arrow-over-HTTP client 
> > > > and
> > > > server examples here:
> > > > https://github.com/apache/arrow/pull/39081
> > > >
> > > > My intention is to expand and improve this set of examples (with your 
> > > > help)
> > > > until they reflect a set of conventions that we are comfortable 
> > > > documenting
> > > > as recommendations.
> > > >
> > > > Please take a look and add comments / suggestions in the PR.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > >  wrote:
> > > >
> > > >> I also think a set of best practices for Arrow over HTTP would be a
> > > >> valuable resource for the community...even if it never becomes a
> > > >> specification of its own, it will be beneficial for API developers and
> > > >> consumers of those APIs to have a place to look to understand how
> > > >> Arrow can help improve throughput/latency/maybe other things. Possibly
> > > >> something like httpbin.org but for requests/responses that use Arrow
> > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > >>
> > > >> It has mostly been covered already, but in the (ubiquitous) situation
> > > >> where a response contains some schema/table and some non-schema/table
> > > >> information there is some tension between throughput (best served by a
> > > >> JSON response plus one or more IPC stream responses) and latency (best
> > > >> served by a single HTTP response? JSON? IPC with metadata/header?). In
> > > >> addition to Antoine's list, I would add:
> > > >>
> > > >> - How to serve the same table in multiple requests (e.g., to saturate
> > > >> a network connection, or because separate worker nodes are generating
> > > >> results anyway).
> > > >> - How to inline a small schema/table into a single request with other
> > > >> metadata (I have seen this done as base64-encoded IPC in JSON, but
> > > >> perhaps there is a better way)
> > > >>
> > > >> If anybody is interested in experimenting, I repurposed a previous
> > > >> experiment I had as a flask app that can stream 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-02-02 Thread Alessandro Molina
On Wed, Dec 6, 2023 at 7:45 PM Ian Cook  wrote:

>
> I am interested to hear more perspectives on this. My perspective is
> that we should recommend using HTTP conventions to keep clean
> separation between the Arrow-formatted binary data payloads and the
> various application-specific fields. This can be achieved by encoding
> application-specific fields in URI paths, query parameters, headers,
> or separate parts of multipart/form-data messages.
>

Submitting big binary data in POST messages via multipart/form-data is
usually not very performant,
in theory the boundary of the message has to be constructed by verifying
that it does not collide with
the content of the data itself. Which for huge files means traversing the
whole file in search of the bytes
matching the boundary.
Many implementation are optimistic based on the fact that there are very
little
chances that a long enough randomly generated boundary will be contained in
the message, but this is
not guaranteed to be true and I would refrain from suggesting an approach
that, even though it's remote,
has a chance of being slow or not working.

Also most HTTP servers tend to implement a maximum request time to reduce
the risk of exhausting the maximum
available connections with broken (or malicious) clients that leave the
connection open for too long.
So uploading a 1GB file in a single POST is at serious risk of failing in
most deployments.

There is also the issue that for multipart/form-data a maximum transferred
data size exists as the content of files is frequently saved
in a temporary file by the HTTP server before it gets forwarded to the
server side application. Thus opening
the system for an out of disk error if a client uploads too big data and no
limit is configured.

So I would suggest that any recommended approach to submit Arrow data via
HTTP relies on Content-Range and chunked uploads
to transmit the data, thus reducing the risk of timeouts or size limits.
And allowing to simply resend a chunk in case of those.


Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-01-08 Thread Wes McKinney
hi all — I was just catching up on e-mail threads and wanted to give a few
historical comments on this.

When we were assembling the Arrow PMC and committing to do the project in
2015, standardizing Arrow-over-REST was always something that was on the
TODO list — at that time we didn't have the IPC protocol yet, so that was
the fundamental design/engineering that had to take place. I agree that
having example code and well-documented patterns for using REST+Arrow in
production would make it easier for people to adopt Arrow in their systems
for transport, and it would have been better to do this years ago to help
onboard users into the ecosystem faster (and make the "getting started"
part of this less of a DYI affair).

When Jacques and I did the original design / prototyping for Flight-on-gRPC
(2018), the goal was not to convey that as "the Preferred Way" for network
transport (which could steer people away from directly using HTTP, though
perhaps that was an unintentional consequence because of our effort
developing and promoting Flight), but rather to establish generic patterns
for creating distributed Arrow services using gRPC, and to optimize the
serialization aspect (i.e. avoiding extra protobuf encoding/decoding steps
which would be present if you naively used gRPC and put the Arrow IPC
format in a protobuf message as a blob).

In any case, I think having a "Rosetta stone"-type setup of starter code
for building HTTP services that send and receive Arrow would be a help to
developers/users who want to adopt Arrow in their systems.

Thanks
Wes

On Wed, Dec 6, 2023 at 1:30 PM Ian Cook  wrote:

> I just remembered that there is an unused "Arrow Experiments" repo [1]
> which Wes created a few years ago [2]. That seems like a more
> appropriate place to open PRs like this one. If there are no
> objections, I will start using that repo for these Arrow-over-HTTP
> PRs.
>
> [1] https://github.com/apache/arrow-experiments
> [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
>
> Ian
>
> On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> >
> > Antoine,
> >
> > Thank you for taking a look. I agree—these are basic examples intended
> > to prove the concept and answer fundamental questions. Next I intend
> > to expand the set of examples to cover more complex cases.
> >
> > > This might necessitate some kind of framing layer, or a
> > > standardized delimiter.
> >
> > I am interested to hear more perspectives on this. My perspective is
> > that we should recommend using HTTP conventions to keep clean
> > separation between the Arrow-formatted binary data payloads and the
> > various application-specific fields. This can be achieved by encoding
> > application-specific fields in URI paths, query parameters, headers,
> > or separate parts of multipart/form-data messages.
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
> wrote:
> > >
> > >
> > > Hi,
> > >
> > > While this looks like a nice start, I would expect more precise
> > > recommendations for writing non-trivial services. Especially, one
> > > question is how to send both an application-specific POST request and
> an
> > > Arrow stream, or an application-specific GET response and an Arrow
> > > stream. This might necessitate some kind of framing layer, or a
> > > standardized delimiter.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >
> > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > This is a continuation of the discussion entitled "[DISCUSS]
> Protocol for
> > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > >
> > > > To inform this discussion, I created some basic Arrow-over-HTTP
> client and
> > > > server examples here:
> > > > https://github.com/apache/arrow/pull/39081
> > > >
> > > > My intention is to expand and improve this set of examples (with
> your help)
> > > > until they reflect a set of conventions that we are comfortable
> documenting
> > > > as recommendations.
> > > >
> > > > Please take a look and add comments / suggestions in the PR.
> > > >
> > > > Thanks,
> > > > Ian
> > > >
> > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > >  wrote:
> > > >
> > > >> I also think a set of best practices for Arrow over HTTP would be a
> > > >> valuable resource for the community...even if it never becomes a
> > > >> specification of its own, it will be beneficial for API developers
> and
> > > >> consumers of those APIs to have a place to look to understand how
> > > >> Arrow can help improve throughput/latency/maybe other things.
> Possibly
> > > >> something like httpbin.org but for requests/responses that use
> Arrow
> > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > >>
> > > >> It has mostly been covered already, but in the (ubiquitous)
> situation
> > > >> where a response contains some schema/table and some
> non-schema/table
> > > >> information there is 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2023-12-06 Thread Ian Cook
I just remembered that there is an unused "Arrow Experiments" repo [1]
which Wes created a few years ago [2]. That seems like a more
appropriate place to open PRs like this one. If there are no
objections, I will start using that repo for these Arrow-over-HTTP
PRs.

[1] https://github.com/apache/arrow-experiments
[2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg

Ian

On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
>
> Antoine,
>
> Thank you for taking a look. I agree—these are basic examples intended
> to prove the concept and answer fundamental questions. Next I intend
> to expand the set of examples to cover more complex cases.
>
> > This might necessitate some kind of framing layer, or a
> > standardized delimiter.
>
> I am interested to hear more perspectives on this. My perspective is
> that we should recommend using HTTP conventions to keep clean
> separation between the Arrow-formatted binary data payloads and the
> various application-specific fields. This can be achieved by encoding
> application-specific fields in URI paths, query parameters, headers,
> or separate parts of multipart/form-data messages.
>
> Ian
>
> On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou  wrote:
> >
> >
> > Hi,
> >
> > While this looks like a nice start, I would expect more precise
> > recommendations for writing non-trivial services. Especially, one
> > question is how to send both an application-specific POST request and an
> > Arrow stream, or an application-specific GET response and an Arrow
> > stream. This might necessitate some kind of framing layer, or a
> > standardized delimiter.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > This is a continuation of the discussion entitled "[DISCUSS] Protocol for
> > > exchanging Arrow data over REST APIs". See the previous messages at
> > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > >
> > > To inform this discussion, I created some basic Arrow-over-HTTP client and
> > > server examples here:
> > > https://github.com/apache/arrow/pull/39081
> > >
> > > My intention is to expand and improve this set of examples (with your 
> > > help)
> > > until they reflect a set of conventions that we are comfortable 
> > > documenting
> > > as recommendations.
> > >
> > > Please take a look and add comments / suggestions in the PR.
> > >
> > > Thanks,
> > > Ian
> > >
> > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > >  wrote:
> > >
> > >> I also think a set of best practices for Arrow over HTTP would be a
> > >> valuable resource for the community...even if it never becomes a
> > >> specification of its own, it will be beneficial for API developers and
> > >> consumers of those APIs to have a place to look to understand how
> > >> Arrow can help improve throughput/latency/maybe other things. Possibly
> > >> something like httpbin.org but for requests/responses that use Arrow
> > >> would be helpful as well. Thank you Ian for leading this effort!
> > >>
> > >> It has mostly been covered already, but in the (ubiquitous) situation
> > >> where a response contains some schema/table and some non-schema/table
> > >> information there is some tension between throughput (best served by a
> > >> JSON response plus one or more IPC stream responses) and latency (best
> > >> served by a single HTTP response? JSON? IPC with metadata/header?). In
> > >> addition to Antoine's list, I would add:
> > >>
> > >> - How to serve the same table in multiple requests (e.g., to saturate
> > >> a network connection, or because separate worker nodes are generating
> > >> results anyway).
> > >> - How to inline a small schema/table into a single request with other
> > >> metadata (I have seen this done as base64-encoded IPC in JSON, but
> > >> perhaps there is a better way)
> > >>
> > >> If anybody is interested in experimenting, I repurposed a previous
> > >> experiment I had as a flask app that can stream IPC to a client:
> > >>
> > >> https://github.com/paleolimbot/2023-11-21_arrow-over-http-scratchpad/pull/1/files
> > >> .
> > >>
> > >>> - recommendations about compression
> > >>
> > >> Just a note that there is also Content-Encoding: gzip (for consumers
> > >> like Arrow JS that don't currently support buffer compression but that
> > >> can leverage the facilities of the browser/http library)
> > >>
> > >> Cheers!
> > >>
> > >> -dewey
> > >>
> > >>
> > >> On Mon, Nov 20, 2023 at 8:30 PM Sutou Kouhei  wrote:
> > >>>
> > >>> Hi,
> > >>>
> >  But how is the performance?
> > >>>
> > >>> It's faster than the original JSON based API.
> > >>>
> > >>> I implemented Apache Arrow support for a C# client. So I
> > >>> measured only with Apache Arrow C# but the Apache Arrow
> > >>> based API is faster than JSON based API.
> > >>>
> >  Have you measured the throughput of this approach to see
> >  if it is comparable to using Flight SQL?
> > >>>
> > >>> Sorry. I didn't measure the throughput. In the case, 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2023-12-06 Thread Ian Cook
Antoine,

Thank you for taking a look. I agree—these are basic examples intended
to prove the concept and answer fundamental questions. Next I intend
to expand the set of examples to cover more complex cases.

> This might necessitate some kind of framing layer, or a
> standardized delimiter.

I am interested to hear more perspectives on this. My perspective is
that we should recommend using HTTP conventions to keep clean
separation between the Arrow-formatted binary data payloads and the
various application-specific fields. This can be achieved by encoding
application-specific fields in URI paths, query parameters, headers,
or separate parts of multipart/form-data messages.

Ian

On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou  wrote:
>
>
> Hi,
>
> While this looks like a nice start, I would expect more precise
> recommendations for writing non-trivial services. Especially, one
> question is how to send both an application-specific POST request and an
> Arrow stream, or an application-specific GET response and an Arrow
> stream. This might necessitate some kind of framing layer, or a
> standardized delimiter.
>
> Regards
>
> Antoine.
>
>
>
> Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > This is a continuation of the discussion entitled "[DISCUSS] Protocol for
> > exchanging Arrow data over REST APIs". See the previous messages at
> > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> >
> > To inform this discussion, I created some basic Arrow-over-HTTP client and
> > server examples here:
> > https://github.com/apache/arrow/pull/39081
> >
> > My intention is to expand and improve this set of examples (with your help)
> > until they reflect a set of conventions that we are comfortable documenting
> > as recommendations.
> >
> > Please take a look and add comments / suggestions in the PR.
> >
> > Thanks,
> > Ian
> >
> > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> >  wrote:
> >
> >> I also think a set of best practices for Arrow over HTTP would be a
> >> valuable resource for the community...even if it never becomes a
> >> specification of its own, it will be beneficial for API developers and
> >> consumers of those APIs to have a place to look to understand how
> >> Arrow can help improve throughput/latency/maybe other things. Possibly
> >> something like httpbin.org but for requests/responses that use Arrow
> >> would be helpful as well. Thank you Ian for leading this effort!
> >>
> >> It has mostly been covered already, but in the (ubiquitous) situation
> >> where a response contains some schema/table and some non-schema/table
> >> information there is some tension between throughput (best served by a
> >> JSON response plus one or more IPC stream responses) and latency (best
> >> served by a single HTTP response? JSON? IPC with metadata/header?). In
> >> addition to Antoine's list, I would add:
> >>
> >> - How to serve the same table in multiple requests (e.g., to saturate
> >> a network connection, or because separate worker nodes are generating
> >> results anyway).
> >> - How to inline a small schema/table into a single request with other
> >> metadata (I have seen this done as base64-encoded IPC in JSON, but
> >> perhaps there is a better way)
> >>
> >> If anybody is interested in experimenting, I repurposed a previous
> >> experiment I had as a flask app that can stream IPC to a client:
> >>
> >> https://github.com/paleolimbot/2023-11-21_arrow-over-http-scratchpad/pull/1/files
> >> .
> >>
> >>> - recommendations about compression
> >>
> >> Just a note that there is also Content-Encoding: gzip (for consumers
> >> like Arrow JS that don't currently support buffer compression but that
> >> can leverage the facilities of the browser/http library)
> >>
> >> Cheers!
> >>
> >> -dewey
> >>
> >>
> >> On Mon, Nov 20, 2023 at 8:30 PM Sutou Kouhei  wrote:
> >>>
> >>> Hi,
> >>>
>  But how is the performance?
> >>>
> >>> It's faster than the original JSON based API.
> >>>
> >>> I implemented Apache Arrow support for a C# client. So I
> >>> measured only with Apache Arrow C# but the Apache Arrow
> >>> based API is faster than JSON based API.
> >>>
>  Have you measured the throughput of this approach to see
>  if it is comparable to using Flight SQL?
> >>>
> >>> Sorry. I didn't measure the throughput. In the case, elapsed
> >>> time of one request/response pair is important than
> >>> throughput. And it was faster than JSON based API and enough
> >>> performance.
> >>>
> >>> I couldn't compare to a Flight SQL based approach because
> >>> Groonga doesn't support Flight SQL yet.
> >>>
>  Is this approach able to saturate a fast network
>  connection?
> >>>
> >>> I think that we can't measure this with the Groonga case
> >>> because the Groonga case doesn't send data without
> >>> stopping. Here is one of request patterns:
> >>>
> >>> 1. Groonga has log data partitioned by day
> >>> 2. Groonga does full text search against one partition (2023-11-01)
> >>> 3. Groonga sends the 

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2023-12-06 Thread Antoine Pitrou



Hi,

While this looks like a nice start, I would expect more precise 
recommendations for writing non-trivial services. Especially, one 
question is how to send both an application-specific POST request and an 
Arrow stream, or an application-specific GET response and an Arrow 
stream. This might necessitate some kind of framing layer, or a 
standardized delimiter.


Regards

Antoine.



Le 05/12/2023 à 21:10, Ian Cook a écrit :

This is a continuation of the discussion entitled "[DISCUSS] Protocol for
exchanging Arrow data over REST APIs". See the previous messages at
https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.

To inform this discussion, I created some basic Arrow-over-HTTP client and
server examples here:
https://github.com/apache/arrow/pull/39081

My intention is to expand and improve this set of examples (with your help)
until they reflect a set of conventions that we are comfortable documenting
as recommendations.

Please take a look and add comments / suggestions in the PR.

Thanks,
Ian

On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
 wrote:


I also think a set of best practices for Arrow over HTTP would be a
valuable resource for the community...even if it never becomes a
specification of its own, it will be beneficial for API developers and
consumers of those APIs to have a place to look to understand how
Arrow can help improve throughput/latency/maybe other things. Possibly
something like httpbin.org but for requests/responses that use Arrow
would be helpful as well. Thank you Ian for leading this effort!

It has mostly been covered already, but in the (ubiquitous) situation
where a response contains some schema/table and some non-schema/table
information there is some tension between throughput (best served by a
JSON response plus one or more IPC stream responses) and latency (best
served by a single HTTP response? JSON? IPC with metadata/header?). In
addition to Antoine's list, I would add:

- How to serve the same table in multiple requests (e.g., to saturate
a network connection, or because separate worker nodes are generating
results anyway).
- How to inline a small schema/table into a single request with other
metadata (I have seen this done as base64-encoded IPC in JSON, but
perhaps there is a better way)

If anybody is interested in experimenting, I repurposed a previous
experiment I had as a flask app that can stream IPC to a client:

https://github.com/paleolimbot/2023-11-21_arrow-over-http-scratchpad/pull/1/files
.


- recommendations about compression


Just a note that there is also Content-Encoding: gzip (for consumers
like Arrow JS that don't currently support buffer compression but that
can leverage the facilities of the browser/http library)

Cheers!

-dewey


On Mon, Nov 20, 2023 at 8:30 PM Sutou Kouhei  wrote:


Hi,


But how is the performance?


It's faster than the original JSON based API.

I implemented Apache Arrow support for a C# client. So I
measured only with Apache Arrow C# but the Apache Arrow
based API is faster than JSON based API.


Have you measured the throughput of this approach to see
if it is comparable to using Flight SQL?


Sorry. I didn't measure the throughput. In the case, elapsed
time of one request/response pair is important than
throughput. And it was faster than JSON based API and enough
performance.

I couldn't compare to a Flight SQL based approach because
Groonga doesn't support Flight SQL yet.


Is this approach able to saturate a fast network
connection?


I think that we can't measure this with the Groonga case
because the Groonga case doesn't send data without
stopping. Here is one of request patterns:

1. Groonga has log data partitioned by day
2. Groonga does full text search against one partition (2023-11-01)
3. Groonga sends the result to client as Apache Arrow
streaming format record batches
4. Groonga does full text search against the next partition (2023-11-02)
5. Groonga sends the result to client as Apache Arrow
streaming format record batches
6. ...

In the case, the result data aren't always sending. (search
-> send -> search -> send -> ...) So it doesn't saturate a
fast network connection.

(3. and 4. can be parallel but it's not implemented yet.)

If we optimize this approach, this approach may be able to
saturate a fast network connection.


And what about the case in which the server wants to begin sending

batches

to the client before the total number of result batches / records is

known?


Ah, sorry. I forgot to explain the case. Groonga uses the
above approach for it.


- server should not return the result data in the body of a response

to a

query request; instead server should return a response body that gives
URI(s) at which clients can GET the result data


If we want to do this, the standard "Location" HTTP headers
may be suitable.


- transmit result data in chunks (Transfer-Encoding: chunked), with
recommendations about chunk size


Ah, sorry. I forgot to explain this case too. 

[DISCUSS] Conventions for transporting Arrow data over HTTP

2023-12-05 Thread Ian Cook
This is a continuation of the discussion entitled "[DISCUSS] Protocol for
exchanging Arrow data over REST APIs". See the previous messages at
https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.

To inform this discussion, I created some basic Arrow-over-HTTP client and
server examples here:
https://github.com/apache/arrow/pull/39081

My intention is to expand and improve this set of examples (with your help)
until they reflect a set of conventions that we are comfortable documenting
as recommendations.

Please take a look and add comments / suggestions in the PR.

Thanks,
Ian

On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
 wrote:

> I also think a set of best practices for Arrow over HTTP would be a
> valuable resource for the community...even if it never becomes a
> specification of its own, it will be beneficial for API developers and
> consumers of those APIs to have a place to look to understand how
> Arrow can help improve throughput/latency/maybe other things. Possibly
> something like httpbin.org but for requests/responses that use Arrow
> would be helpful as well. Thank you Ian for leading this effort!
>
> It has mostly been covered already, but in the (ubiquitous) situation
> where a response contains some schema/table and some non-schema/table
> information there is some tension between throughput (best served by a
> JSON response plus one or more IPC stream responses) and latency (best
> served by a single HTTP response? JSON? IPC with metadata/header?). In
> addition to Antoine's list, I would add:
>
> - How to serve the same table in multiple requests (e.g., to saturate
> a network connection, or because separate worker nodes are generating
> results anyway).
> - How to inline a small schema/table into a single request with other
> metadata (I have seen this done as base64-encoded IPC in JSON, but
> perhaps there is a better way)
>
> If anybody is interested in experimenting, I repurposed a previous
> experiment I had as a flask app that can stream IPC to a client:
>
> https://github.com/paleolimbot/2023-11-21_arrow-over-http-scratchpad/pull/1/files
> .
>
> > - recommendations about compression
>
> Just a note that there is also Content-Encoding: gzip (for consumers
> like Arrow JS that don't currently support buffer compression but that
> can leverage the facilities of the browser/http library)
>
> Cheers!
>
> -dewey
>
>
> On Mon, Nov 20, 2023 at 8:30 PM Sutou Kouhei  wrote:
> >
> > Hi,
> >
> > > But how is the performance?
> >
> > It's faster than the original JSON based API.
> >
> > I implemented Apache Arrow support for a C# client. So I
> > measured only with Apache Arrow C# but the Apache Arrow
> > based API is faster than JSON based API.
> >
> > > Have you measured the throughput of this approach to see
> > > if it is comparable to using Flight SQL?
> >
> > Sorry. I didn't measure the throughput. In the case, elapsed
> > time of one request/response pair is important than
> > throughput. And it was faster than JSON based API and enough
> > performance.
> >
> > I couldn't compare to a Flight SQL based approach because
> > Groonga doesn't support Flight SQL yet.
> >
> > > Is this approach able to saturate a fast network
> > > connection?
> >
> > I think that we can't measure this with the Groonga case
> > because the Groonga case doesn't send data without
> > stopping. Here is one of request patterns:
> >
> > 1. Groonga has log data partitioned by day
> > 2. Groonga does full text search against one partition (2023-11-01)
> > 3. Groonga sends the result to client as Apache Arrow
> >streaming format record batches
> > 4. Groonga does full text search against the next partition (2023-11-02)
> > 5. Groonga sends the result to client as Apache Arrow
> >streaming format record batches
> > 6. ...
> >
> > In the case, the result data aren't always sending. (search
> > -> send -> search -> send -> ...) So it doesn't saturate a
> > fast network connection.
> >
> > (3. and 4. can be parallel but it's not implemented yet.)
> >
> > If we optimize this approach, this approach may be able to
> > saturate a fast network connection.
> >
> > > And what about the case in which the server wants to begin sending
> batches
> > > to the client before the total number of result batches / records is
> known?
> >
> > Ah, sorry. I forgot to explain the case. Groonga uses the
> > above approach for it.
> >
> > > - server should not return the result data in the body of a response
> to a
> > > query request; instead server should return a response body that gives
> > > URI(s) at which clients can GET the result data
> >
> > If we want to do this, the standard "Location" HTTP headers
> > may be suitable.
> >
> > > - transmit result data in chunks (Transfer-Encoding: chunked), with
> > > recommendations about chunk size
> >
> > Ah, sorry. I forgot to explain this case too. Groonga uses
> > "Transfer-Encoding: chunked". But recommended chunk size may
> > be case-by-case... If a server can