[jira] [Commented] (ARROW-3151) [C++] Create Protocol Buffers interface for iterating over the semantic "rows" of a record batch, and accessing the rows using the protobuf API

2018-08-30 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597948#comment-16597948
 ] 

Wes McKinney commented on ARROW-3151:
-

Thanks Paul -- indeed this is a C++ initiative but I agree it would be useful 
to align on some target use cases, requirements, etc. What I intended here is 
pretty orthogonal to memory management, or any details of where the Arrow 
columnar memory comes from; a user might have some code which utilizes a 
Protobuf-based row interface and we want to feed that Arrow data as quickly as 
possible

> [C++] Create Protocol Buffers interface for iterating over the semantic 
> "rows" of a record batch, and accessing the rows using the protobuf API
> ---
>
> Key: ARROW-3151
> URL: https://issues.apache.org/jira/browse/ARROW-3151
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> The desired workflow:
> * User writes a .proto file describing the structure of a "row" as a Message
> * Given the generated pb.h bindings, an Arrow users can iterate over an 
> {{arrow::RecordBatch}}, each iteration populating an instance of the Row 
> message
> * The values of the row can then be accessed via the standard Protobuf APIs
> A corresponding interface could be developed to write a RecordBatch using 
> protobufs as input, but that could be its own project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3151) [C++] Create Protocol Buffers interface for iterating over the semantic "rows" of a record batch, and accessing the rows using the protobuf API

2018-08-30 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597920#comment-16597920
 ] 

Paul Rogers commented on ARROW-3151:


Let's see if we can coordinate on this. I'm starting work on a proposal for a 
"RowSet" interface to be ported over from Drill that provides a simple 
row-based API to read from, and write to, vectors. On the write site, the 
mechanism also enforces memory limits, which is the key reason Drill created 
the "RowSet" abstraction.

Given that this project will need a way to assemble a row from a bundle of 
vectors, the "columnar-to-row" mechanism of RowSet might be a way to populate 
the row buffer.

On the other hand, the RowSet code from Drill is in Java, this is C++. Still, 
might make sense to port the mechanism to C++ so it can be used in multiple 
contexts.

Any background docs I could read to get a better understanding of the project 
context to determine if what was just said above makes sense in this context? 
Thanks.

> [C++] Create Protocol Buffers interface for iterating over the semantic 
> "rows" of a record batch, and accessing the rows using the protobuf API
> ---
>
> Key: ARROW-3151
> URL: https://issues.apache.org/jira/browse/ARROW-3151
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> The desired workflow:
> * User writes a .proto file describing the structure of a "row" as a Message
> * Given the generated pb.h bindings, an Arrow users can iterate over an 
> {{arrow::RecordBatch}}, each iteration populating an instance of the Row 
> message
> * The values of the row can then be accessed via the standard Protobuf APIs
> A corresponding interface could be developed to write a RecordBatch using 
> protobufs as input, but that could be its own project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)