[jira] [Updated] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
[ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5377: -- Labels: pull-request-available (was: ) > [C++] Develop interface for writing a RecordBatch IPC stream into > pre-allocated space (e.g. memory map) that avoids unnecessary serialization > - > > Key: ARROW-5377 > URL: https://issues.apache.org/jira/browse/ARROW-5377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As discussed in recent mailing list thread > https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E > The only viable process at the moment for getting an accurate report of > stream size is to write a simulated stream using {{MockOutputStream}}. This > is suboptimal for a couple of reasons: > * Flatbuffers metadata must be created twice > * Record batch disassembly into IpcPayload must be performed twice > It seems like an interface with a very constrained public API could be > provided to deconstruct a sequence of RecordBatches and report the size of > the produced IPC stream (based on metadata sizes, and padding), and then this > deconstructed set of IPC payloads can be written out to a stream (e.g. using > {{FixedSizeBufferWriter}}) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
[ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5377: Fix Version/s: (was: 0.16.0) > [C++] Develop interface for writing a RecordBatch IPC stream into > pre-allocated space (e.g. memory map) that avoids unnecessary serialization > - > > Key: ARROW-5377 > URL: https://issues.apache.org/jira/browse/ARROW-5377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > As discussed in recent mailing list thread > https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E > The only viable process at the moment for getting an accurate report of > stream size is to write a simulated stream using {{MockOutputStream}}. This > is suboptimal for a couple of reasons: > * Flatbuffers metadata must be created twice > * Record batch disassembly into IpcPayload must be performed twice > It seems like an interface with a very constrained public API could be > provided to deconstruct a sequence of RecordBatches and report the size of > the produced IPC stream (based on metadata sizes, and padding), and then this > deconstructed set of IPC payloads can be written out to a stream (e.g. using > {{FixedSizeBufferWriter}}) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
[ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5377: Fix Version/s: 1.0.0 > [C++] Develop interface for writing a RecordBatch IPC stream into > pre-allocated space (e.g. memory map) that avoids unnecessary serialization > - > > Key: ARROW-5377 > URL: https://issues.apache.org/jira/browse/ARROW-5377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > As discussed in recent mailing list thread > https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E > The only viable process at the moment for getting an accurate report of > stream size is to write a simulated stream using {{MockOutputStream}}. This > is suboptimal for a couple of reasons: > * Flatbuffers metadata must be created twice > * Record batch disassembly into IpcPayload must be performed twice > It seems like an interface with a very constrained public API could be > provided to deconstruct a sequence of RecordBatches and report the size of > the produced IPC stream (based on metadata sizes, and padding), and then this > deconstructed set of IPC payloads can be written out to a stream (e.g. using > {{FixedSizeBufferWriter}}) -- This message was sent by Atlassian Jira (v8.3.4#803005)