[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264675#comment-16264675 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on issue #1349: ARROW-1047: [Java] [FollowUp] Change ArrowMagic to be non-public class URL: https://github.com/apache/arrow/pull/1349#issuecomment-346679417 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264676#comment-16264676 ] ASF GitHub Bot commented on ARROW-1047: --- wesm closed pull request #1349: ARROW-1047: [Java] [FollowUp] Change ArrowMagic to be non-public class URL: https://github.com/apache/arrow/pull/1349 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java index a9310a608..f71318ee6 100644 --- a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java +++ b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java @@ -18,13 +18,11 @@ package org.apache.arrow.vector.ipc; -import org.apache.arrow.vector.ipc.WriteChannel; - import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.Arrays; -public class ArrowMagic { +class ArrowMagic { private static final byte[] MAGIC = "ARROW1".getBytes(StandardCharsets.UTF_8); This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264438#comment-16264438 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move ArrowMagic to ipc.message package URL: https://github.com/apache/arrow/pull/1349#issuecomment-346639218 Yes that's good point. I will change the PR to make it ArrowMagic non public class. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264414#comment-16264414 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move ArrowMagic to ipc.message package URL: https://github.com/apache/arrow/pull/1349#issuecomment-346635131 I suggest we do not merge this because the magic number is only used in the file implementation, if that's OK This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263809#comment-16263809 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move ArrowMagic to ipc.message package URL: https://github.com/apache/arrow/pull/1349#issuecomment-346531235 I was thinking of `ArrowMagic` more being part of the file protocol than a message, but it's fine with me if you prefer to move it there. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263230#comment-16263230 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1349: ARROW-1047: [Java] [FollowUp] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1349#issuecomment-346454899 cc @BryanCutler This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263229#comment-16263229 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss opened a new pull request #1349: ARROW-1047: [Java] [FollowUp] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1349 Move ArrowMagic from vector.ipc to vector.ipc.message package. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263162#comment-16263162 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444248 Thanks @wesm @icexelloss @elahrvivaz and @siddharthteotia ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263161#comment-16263161 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444019 I'd like to keep the `vector.ipc.message` package, I think these generally define messages that serialize to FB. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > Fix For: 0.8.0 > > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263125#comment-16263125 ] ASF GitHub Bot commented on ARROW-1047: --- wesm closed pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java index 3091bc4da..ce6b5164a 100644 --- a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java +++ b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.ArrowStreamReader; +import org.apache.arrow.vector.ipc.ArrowStreamWriter; import org.slf4j.Logger; import org.slf4j.LoggerFactory; diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java index ab8fa6e45..6e45305bf 100644 --- a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java +++ b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java @@ -22,8 +22,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.file.ArrowFileReader; -import org.apache.arrow.vector.file.ArrowFileWriter; +import org.apache.arrow.vector.ipc.ArrowFileReader; +import org.apache.arrow.vector.ipc.ArrowFileWriter; import org.apache.arrow.vector.types.pojo.Schema; import org.apache.commons.cli.CommandLine; import org.apache.commons.cli.CommandLineParser; diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java index 6722b30fa..3db01f40c 100644 --- a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java +++ b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java @@ -21,8 +21,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.file.ArrowFileReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.ArrowFileReader; +import org.apache.arrow.vector.ipc.ArrowStreamWriter; import java.io.File; import java.io.FileInputStream; diff --git a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java index d2b35e65a..666f1ddea 100644 --- a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java +++ b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java @@ -22,11 +22,11 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.file.ArrowBlock; -import org.apache.arrow.vector.file.ArrowFileReader; -import org.apache.arrow.vector.file.ArrowFileWriter; -import org.apache.arrow.vector.file.json.JsonFileReader; -import org.apache.arrow.vector.file.json.JsonFileWriter; +import org.apache.arrow.vector.ipc.message.ArrowBlock; +import org.apache.arrow.vector.ipc.ArrowFileReader; +import org.apache.arrow.vector.ipc.ArrowFileWriter; +import org.apache.arrow.vector.ipc.JsonFileReader; +import org.apache.arrow.vector.ipc.JsonFileWriter; import org.apache.arrow.vector.types.pojo.DictionaryEncoding; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.Schema; diff --git a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java index ef1a11f6b..42d336af9 100644 --- a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java +++ b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java @@ -21,8 +21,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.file.ArrowFileWriter; -import org.apache.arrow.vector.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.ArrowFileWriter; +import org.apache.arrow.vector.ipc.ArrowStreamReader; import java.io.File; import
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263122#comment-16263122 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346440612 Reviewing the past comments, since these classes are generally internal, I think it's fine. master is broken right now (ARROW-1845) so I will merge this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263033#comment-16263033 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446 I do not have a strong feeling either, I think `vector.ipc.message` subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` subnamespace? @BryanCutler what do you think This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263034#comment-16263034 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446 I do not have a strong feeling either, I think `vector.ipc.message` subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` subnamespace? Sorry for the oversight. @BryanCutler what do you think This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263020#comment-16263020 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346429291 Squashed and rebased so we can get a passing build. While we are waiting, do we also want the `vector.ipc.message` subnamespace? Do not have a strong feeling This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262925#comment-16262925 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346415200 LGTM. +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262873#comment-16262873 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152614571 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: Sure, I'm fine with this. I'll change it now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261934#comment-16261934 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346235425 This looks good to me. Once the package name hierarchy I think this should be good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261747#comment-16261747 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152439921 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: I prefer `ipc.ArrowStreamReader` to `ipc.stream.ArrowStreamReader` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261735#comment-16261735 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152439165 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: @icexelloss do you have an opinion on this? Would be good to get this patch in soon to facilitate testing This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261521#comment-16261521 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152356580 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: Do you think the same for file and json readers, e.g. `ipc.ArrowFileReader`? I made these subpackages because there were some supporting files specific to just the file reader, so they could be grouped together. But I'm ok either way, @icexelloss brought this up here https://github.com/apache/arrow/pull/1259#issuecomment-340562836 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261508#comment-16261508 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152406142 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: These classes are all quite similar (the file format is very nearly the stream format, plus a file footer and magic numbers at start and end), I think it would make sense to keep them in a flat package namespace (but I'm not a Java expert) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261192#comment-16261192 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-346112886 @siddharthteotia what ever is easier for this, but I would like to hear that I didn't break anything on your side :) It's pretty easy to rebase this, so no need to rush This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259718#comment-16259718 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152087227 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: I would make this `ipc.ArrowStreamReader` but not `ipc.stream` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259717#comment-16259717 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r152087227 ## File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java ## @@ -23,8 +23,8 @@ import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VectorSchemaRoot; -import org.apache.arrow.vector.stream.ArrowStreamReader; -import org.apache.arrow.vector.stream.ArrowStreamWriter; +import org.apache.arrow.vector.ipc.stream.ArrowStreamReader; +import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter; Review comment: I would either make this `ipc.ArrowStreamReader` but not `ipc.stream` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257849#comment-16257849 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-345407725 @siddharthteotia is this something you would like to run with the Dremio suite of tests before merging? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240897#comment-16240897 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-342286410 @siddharthteotia that's fair enough, I don't want to complicate the refactoring. I mostly just want to make sure that these changes don't make things harder to merge the java-vector-refactor branch into master. I can try that out locally and report back. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239462#comment-16239462 ] ASF GitHub Bot commented on ARROW-1047: --- siddharthteotia commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-341959674 @BryanCutler, are you suggesting to cherry pick your changes in refactor branch and revert commit in case things don't look good? I am not entirely sure what's the best option here but I believe that adding orthogonal set of changes to java-vector-refactor branch at this point may not be a good idea. However, I don't want to block other work. So feel free to proceed based on your best judgement. Note that there are currently two patches in that branch. While making changes in Dremio and debugging test failures, I had to go back and make some changes in vector code (minor only, no redesign). Currently those additional changes are in Dremio's fork (as I wanted to make quick progress) and I will put a PR against java-vector-refactor branch for the third patch very soon -- better to do at last when testing with Dremio completes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234560#comment-16234560 ] ASF GitHub Bot commented on ARROW-1047: --- siddharthteotia commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-341202073 Yes I am concerned that this will make patches in java-vector-refactor branch hard to merge into master, Secondly, the nature of changes suggest that we should be testing this with Dremio as well -- I would have loved to offer help but I am in the process of moving Dremio to new code in java-vector-refactor branch. I would prefer to have these changes merged after java-vector-refactor changes are merged into master. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234543#comment-16234543 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-341199634 One thing I am not sure is if this patch will make java-refactor-branch hard to merge - cc @siddharthteotia for comment. Maybe we should keep all refactor changes in java-refactor-branch to make it easier to merge? Not sure though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234506#comment-16234506 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148339956 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema schema) throws IOException /** * Deserializes a schema object. Format is from serialize(). * - * @param in the channel to deserialize from + * @param reader the reader interface to deserialize from * @return the deserialized object * @throws IOException if something went wrong */ - public static Schema deserializeSchema(ReadChannel in) throws IOException { -Message message = deserializeMessage(in); + public static Schema deserializeSchema(MessageReader reader) throws IOException { Review comment: Ok. Agree this can be a follow upl This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234505#comment-16234505 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148339853 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch deserializeDictionaryBatch(ReadChannel in, return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch); } - public static ArrowMessage deserializeMessageBatch(ReadChannel in, BufferAllocator alloc) throws IOException { -Message message = deserializeMessage(in); + public static ArrowMessage deserializeMessageBatch(MessageReader reader, BufferAllocator alloc) throws IOException { Review comment: Ok. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234499#comment-16234499 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148338482 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch deserializeDictionaryBatch(ReadChannel in, return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch); } - public static ArrowMessage deserializeMessageBatch(ReadChannel in, BufferAllocator alloc) throws IOException { -Message message = deserializeMessage(in); + public static ArrowMessage deserializeMessageBatch(MessageReader reader, BufferAllocator alloc) throws IOException { Review comment: Yeah, I think it's ok as is but this seems to be used only in a test. How about we do a followup PR to refine these functions and we can discuss there? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234494#comment-16234494 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148337707 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema schema) throws IOException /** * Deserializes a schema object. Format is from serialize(). * - * @param in the channel to deserialize from + * @param reader the reader interface to deserialize from * @return the deserialized object * @throws IOException if something went wrong */ - public static Schema deserializeSchema(ReadChannel in) throws IOException { -Message message = deserializeMessage(in); + public static Schema deserializeSchema(MessageReader reader) throws IOException { Review comment: I think it's ok to include reading the message as part of deserialization and some messages also require to read another chunk after the message. I do think the behavior of these functions could be made to be more consistent, but we should probably do that as a followup. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234429#comment-16234429 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148328047 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageReader.java ## @@ -0,0 +1,37 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector.ipc.message; + + +import io.netty.buffer.ArrowBuf; +import org.apache.arrow.flatbuf.Message; +import org.apache.arrow.memory.BufferAllocator; + +import java.io.IOException; + +public interface MessageReader { + + Message readNextMessage() throws IOException; + + ArrowBuf readMessageBody(Message message, BufferAllocator allocator) throws IOException; Review comment: Yeah, I meant to say that I still need to go through these changes and make sure everything is documented properly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227286#comment-16227286 ] ASF GitHub Bot commented on ARROW-1047: --- elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148088136 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch deserializeDictionaryBatch(ReadChannel in, return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch); } - public static ArrowMessage deserializeMessageBatch(ReadChannel in, BufferAllocator alloc) throws IOException { -Message message = deserializeMessage(in); + public static ArrowMessage deserializeMessageBatch(MessageReader reader, BufferAllocator alloc) throws IOException { Review comment: this method won't read any generic message, it only works with RecordBatches or DictionaryBatches, hence the name... in the streaming format the first message after the schema could be either a record batch or a dictionary batch, this method is to handle either case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227258#comment-16227258 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148083371 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch deserializeDictionaryBatch(ReadChannel in, return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch); } - public static ArrowMessage deserializeMessageBatch(ReadChannel in, BufferAllocator alloc) throws IOException { -Message message = deserializeMessage(in); + public static ArrowMessage deserializeMessageBatch(MessageReader reader, BufferAllocator alloc) throws IOException { Review comment: The word "Batch" in the function name is a bit unintuitive. I kind of feel "Message" is a better term than "MessageBatch". Should we maybe rename this to `deserializeMessage`? Also, this message doesn't seem to exclude schema message explicitly. Which also feels a bit weird. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227234#comment-16227234 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148080722 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java ## @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema schema) throws IOException /** * Deserializes a schema object. Format is from serialize(). * - * @param in the channel to deserialize from + * @param reader the reader interface to deserialize from * @return the deserialized object * @throws IOException if something went wrong */ - public static Schema deserializeSchema(ReadChannel in) throws IOException { -Message message = deserializeMessage(in); + public static Schema deserializeSchema(MessageReader reader) throws IOException { Review comment: This method seems to closer to `read schema` rather than `deserialize schema` ``` public static Schema deserializeSchema(Message message) ``` seem to make more sense to me Maybe this method can be made into: ``` public static Schema readSchema(MessageReader reader) { Message message = reader.readNextMessage(); return deserializeSchema(message); } ``` ? @BryanCutler what do you think This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227221#comment-16227221 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148079120 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageReader.java ## @@ -0,0 +1,37 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector.ipc.message; + + +import io.netty.buffer.ArrowBuf; +import org.apache.arrow.flatbuf.Message; +import org.apache.arrow.memory.BufferAllocator; + +import java.io.IOException; + +public interface MessageReader { + + Message readNextMessage() throws IOException; + + ArrowBuf readMessageBody(Message message, BufferAllocator allocator) throws IOException; Review comment: Maybe add a bit doc of what these methods are supposed to do? It's not very clear how to use `readNextMessage` and `readMessageBody` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227217#comment-16227217 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148078355 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java ## @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector.ipc.message; + + +import io.netty.buffer.ArrowBuf; +import org.apache.arrow.flatbuf.Message; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.vector.ipc.ReadChannel; + +import java.io.IOException; +import java.nio.ByteBuffer; + +public class MessageChannelReader implements MessageReader { + + private ReadChannel in; + + public MessageChannelReader(ReadChannel in) { +this.in = in; + } + + public Message readNextMessage() throws IOException { Review comment: Add override? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227214#comment-16227214 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148078259 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java ## @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector.ipc.message; + + +import io.netty.buffer.ArrowBuf; +import org.apache.arrow.flatbuf.Message; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.vector.ipc.ReadChannel; + +import java.io.IOException; +import java.nio.ByteBuffer; + +public class MessageChannelReader implements MessageReader { + + private ReadChannel in; + + public MessageChannelReader(ReadChannel in) { Review comment: add override? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227216#comment-16227216 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r148078259 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java ## @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.arrow.vector.ipc.message; + + +import io.netty.buffer.ArrowBuf; +import org.apache.arrow.flatbuf.Message; +import org.apache.arrow.memory.BufferAllocator; +import org.apache.arrow.vector.ipc.ReadChannel; + +import java.io.IOException; +import java.nio.ByteBuffer; + +public class MessageChannelReader implements MessageReader { + + private ReadChannel in; + + public MessageChannelReader(ReadChannel in) { Review comment: add override? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226999#comment-16226999 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340803970 > I sort of prefer having separate packages for the different readers/writers. There are some supporting files that are specific to certain formats, like ArrowMagic and InvalidArrowFileException, and I like pushing it down to the feature that uses them. I think users will be more likely to import reader/writer from 1 format for a particular use too. I'm not tied to this though, we can simplify if that's the consensus. > imo i like the current package layout with file, stream, json, message. Sounds good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226731#comment-16226731 ] ASF GitHub Bot commented on ARROW-1047: --- elahrvivaz commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340753115 imo i like the current package layout with file, stream, json, message. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225879#comment-16225879 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340606975 Thanks @elahrvivaz, @icexelloss and @wesm ! >What do people feel about having less sub namespaces? I sort of prefer having separate packages for the different readers/writers. There are some supporting files that are specific to certain formats, like `ArrowMagic` and `InvalidArrowFileException`, and I like pushing it down to the feature that uses them. I think users will be more likely to import reader/writer from 1 format for a particular use too. I'm not tied to this though, we can simplify if that's the consensus. >Also maybe JsonFileReader -> ArrowJsonReader for more consistent naming? +1 for me on renaming this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225650#comment-16225650 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340565857 Backward compatibility wise, I think we should probably change this along with vector changes in one arrow release? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225640#comment-16225640 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340563276 Also maybe `JsonFileReader` -> `ArrowJsonReader` for more consistent naming? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225638#comment-16225638 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340562836 @BryanCutler This looks great! What do people feel about having less sub namespaces? Original, ``` o.a.a.vector.ipc file ArrowFileReader ArrowFileWriter ArrowMagic stream ArrowStreamReader ArrowStreamWriter json JsonFileReader JsonFileWriter message ArrowBlock ArrowFooter ArrowMessage ArrowRecordBatch ArrowDictionaryBatch FBSerializable FBSerializables MessageSerializer ArrowReader ArrowWriter ReadChannel WriteChannel ``` How do people feel about: ``` o.a.a.vector.ipc message ArrowBlock ArrowFooter ArrowMessage ArrowRecordBatch ArrowDictionaryBatch FBSerializable FBSerializables MessageSerializer ArrowReader ArrowWriter ArrowFileReader ArrowFileWriter ArrowMagic ArrowStreamReader ArrowStreamWriter ReadChannel WriteChannel JsonFileReader JsonFileWriter ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225639#comment-16225639 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340562836 @BryanCutler This looks great! What do people feel about having less sub namespaces? Original, ``` o.a.a.vector.ipc file ArrowFileReader ArrowFileWriter ArrowMagic stream ArrowStreamReader ArrowStreamWriter json JsonFileReader JsonFileWriter message ArrowBlock ArrowFooter ArrowMessage ArrowRecordBatch ArrowDictionaryBatch FBSerializable FBSerializables MessageSerializer ArrowReader ArrowWriter ReadChannel WriteChannel ``` Less sub namespaces: ``` o.a.a.vector.ipc message ArrowBlock ArrowFooter ArrowMessage ArrowRecordBatch ArrowDictionaryBatch FBSerializable FBSerializables MessageSerializer ArrowReader ArrowWriter ArrowFileReader ArrowFileWriter ArrowMagic ArrowStreamReader ArrowStreamWriter ReadChannel WriteChannel JsonFileReader JsonFileWriter ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225621#comment-16225621 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r147812345 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/file/ArrowFileWriter.java ## @@ -16,14 +16,18 @@ * limitations under the License. */ -package org.apache.arrow.vector.file; +package org.apache.arrow.vector.ipc.file; Review comment: What do you feel about get rid of "file" and "stream" sub namespace, i.e. ``` org.apache.arrow.vector.ipc.ArrowFileWriter ``` ``` org.apache.arrow.vector.ipc.ArrowStreamWriter ``` I think these two namespaces `file` and `stream` are not very complicated, they can probably be combined This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225608#comment-16225608 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340558357 > I'm not sure, all of the current messages are geared towards vectors so it makes sense to keep it there. Are you thinking of possible messages in the future that might not be vector related? I think this is fine for now. Longer term, I kind of think we can improve the current package hierarchy where all API is under the name space `org.apache.arrow.vector`. A hierarchy similar to C++ might make more sense - `o.a.a.vector` `o.a.a.ipc` and etc. But no need to do it here I think. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225531#comment-16225531 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340546565 > having ipc should be a top level package rather than a subpackage under vector, i.e. org.apache.arrow.ipc I'm not sure, all of the current messages are geared towards vectors so it makes sense to keep it there. Are you thinking of possible messages in the future that might not be vector related? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225530#comment-16225530 ] ASF GitHub Bot commented on ARROW-1047: --- elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r147799085 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java ## @@ -216,9 +171,32 @@ private void initialize() throws IOException { this.root = new VectorSchemaRoot(schema, vectors, 0); this.loader = new VectorLoader(root); this.dictionaries = Collections.unmodifiableMap(dictionaries); + +// Read and load all dictionaries from schema +for (int i = 0; i < dictionaries.size(); i++) { Review comment: yeah, an overloaded method would be fine. I agree that having to load a batch before reading dictionaries is a bit confusing for the general use case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225520#comment-16225520 ] ASF GitHub Bot commented on ARROW-1047: --- BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r147797804 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java ## @@ -216,9 +171,32 @@ private void initialize() throws IOException { this.root = new VectorSchemaRoot(schema, vectors, 0); this.loader = new VectorLoader(root); this.dictionaries = Collections.unmodifiableMap(dictionaries); + +// Read and load all dictionaries from schema +for (int i = 0; i < dictionaries.size(); i++) { Review comment: Yeah, we could still do that. I think it just comes down to either reading the dictionaries after the schema, or reading them before the first data batch. I thought it made a little more sense to read them with the schema, otherwise the user could create the reader, load the schema and try to decode it but fail. Would it work for you to maybe overload `ArrowReader.readSchema` which will be able to return the original schema before loading the dictionaries? Similarly, if using the stream format, you could make a subclass of `MessageReader` (introduced here) and react after reading a schema message. If not, I'm ok with reading them before data batches and documenting for the user that you can't decode until batches are read. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225348#comment-16225348 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340517275 At the high level, @BryanCutler what do you feel about having `ipc` be a top level package rather than a subpackage under `vector`, i.e. `org.apache.arrow.ipc` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225347#comment-16225347 ] ASF GitHub Bot commented on ARROW-1047: --- icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340517275 At the high level, @BryanCutler what do you feel about having `ipc` should be a top level package rather than a subpackage under `vector`, i.e. `org.apache.arrow.ipc` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224883#comment-16224883 ] ASF GitHub Bot commented on ARROW-1047: --- elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#discussion_r147695368 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java ## @@ -216,9 +171,32 @@ private void initialize() throws IOException { this.root = new VectorSchemaRoot(schema, vectors, 0); this.loader = new VectorLoader(root); this.dictionaries = Collections.unmodifiableMap(dictionaries); + +// Read and load all dictionaries from schema +for (int i = 0; i < dictionaries.size(); i++) { Review comment: sometimes it's useful to be able to just read the schema out of a message, without loading up any dictionaries or record batches. is there a way to preserve that functionality somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224275#comment-16224275 ] ASF GitHub Bot commented on ARROW-1047: --- wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface for Stream Format URL: https://github.com/apache/arrow/pull/1259#issuecomment-340318241 @BryanCutler at a high level this sounds great to me. cc @nongli also to take a look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler > Labels: pull-request-available > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing
[ https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014648#comment-16014648 ] Wes McKinney commented on ARROW-1047: - The benefits of this work is that stream readers and writers would not need to know about the underlying transport (whether the messaging are being written directly to a byte channel, or placed in a queue to be sent asynchronously through some RPC protocol). > [Java] Add generalized stream writer and reader interfaces that are decoupled > from IO / message framing > --- > > Key: ARROW-1047 > URL: https://issues.apache.org/jira/browse/ARROW-1047 > Project: Apache Arrow > Issue Type: New Feature > Components: Java - Vectors >Reporter: Wes McKinney > > cc [~julienledem] [~elahrvivaz] [~nongli] > The ArrowWriter > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java > accepts a WriteableByteChannel where the stream is written > It would be useful to be able to support other kinds of message framing and > transport, like GRPC or HTTP. So rather than writing a complete Arrow stream > as a single contiguous byte stream, the component messages (schema, > dictionaries, and record batches) would be framed as separate messages in the > underlying protocol. > So if we were using ProtocolBuffers and gRPC as the underlying transport for > the stream, we could encapsulate components of an Arrow stream in objects > like: > {code:language=protobuf} > message ArrowMessagePB { > required bytes serialized_data; > } > {code} > If the transport supports zero copy, that is obviously better than > serializing then parsing a protocol buffer. > We should do this work in C++ as well to support more flexible stream > transport. -- This message was sent by Atlassian JIRA (v6.3.15#6346)