hi Ivan,

Currently all implementations of Arrow treat record batch protocol
messages are atomic entities, in the sense that IPC protocol readers
expect to have access to the entire message in virtual address space.

If Arrow protocol payloads need to be split on the wire, usually
that's handled by the underlying transport layer. For example, in
Flight (which uses gRPC as its default transport), gRPC breaks large
messages into smaller buffers internally.

- Wes

On Mon, Jun 24, 2019 at 8:29 PM Ivan Popivanov <[email protected]> wrote:
>
> Hello,
>
> Looking at these examples and the documentation, it seems that a record batch 
> cannot span multiple messages. Is my understanding correct?
>
> Here is the scenario I am considering: two columns, an int and a string. Let 
> assume that we want the maximum message size to be 64K. If there is a row 
> with a string value of let's say 70K, it has to span multiple batches. Does 
> the current message format support this?
>
> If it doesn't, then another layer is needed to create the messages when a 
> column size is a multiple of the message size.
>
> Thanks
> Ivan

Reply via email to