[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264675#comment-16264675
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1349: ARROW-1047: [Java] [FollowUp] Change ArrowMagic 
to be non-public class
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346679417
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264676#comment-16264676
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm closed pull request #1349: ARROW-1047: [Java] [FollowUp] Change ArrowMagic 
to be non-public class
URL: https://github.com/apache/arrow/pull/1349
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java 
b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java
index a9310a608..f71318ee6 100644
--- a/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java
+++ b/java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowMagic.java
@@ -18,13 +18,11 @@
 
 package org.apache.arrow.vector.ipc;
 
-import org.apache.arrow.vector.ipc.WriteChannel;
-
 import java.io.IOException;
 import java.nio.charset.StandardCharsets;
 import java.util.Arrays;
 
-public class ArrowMagic {
+class ArrowMagic {
 
   private static final byte[] MAGIC = 
"ARROW1".getBytes(StandardCharsets.UTF_8);
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264438#comment-16264438
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move 
ArrowMagic to ipc.message package
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346639218
 
 
   Yes that's good point. I will change the PR to make it ArrowMagic non public 
class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264414#comment-16264414
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move ArrowMagic to 
ipc.message package
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346635131
 
 
   I suggest we do not merge this because the magic number is only used in the 
file implementation, if that's OK


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263809#comment-16263809
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1349: ARROW-1047: [Java] [FollowUp] Move 
ArrowMagic to ipc.message package
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346531235
 
 
   I was thinking of `ArrowMagic` more being part of the file protocol than a 
message, but it's fine with me if you prefer to move it there.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263230#comment-16263230
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1349: ARROW-1047: [Java] [FollowUp] Add Generic 
Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1349#issuecomment-346454899
 
 
   cc @BryanCutler 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263229#comment-16263229
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss opened a new pull request #1349: ARROW-1047: [Java] [FollowUp] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1349
 
 
   Move ArrowMagic from vector.ipc to vector.ipc.message package.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263162#comment-16263162
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444248
 
 
   Thanks @wesm @icexelloss @elahrvivaz and @siddharthteotia !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263161#comment-16263161
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346444019
 
 
   I'd like to keep the `vector.ipc.message` package, I think these generally 
define messages that serialize to FB.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263125#comment-16263125
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm closed pull request #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java 
b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
index 3091bc4da..ce6b5164a 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
@@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.ArrowStreamWriter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java 
b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
index ab8fa6e45..6e45305bf 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/FileRoundtrip.java
@@ -22,8 +22,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.file.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
 import org.apache.arrow.vector.types.pojo.Schema;
 import org.apache.commons.cli.CommandLine;
 import org.apache.commons.cli.CommandLineParser;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java 
b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
index 6722b30fa..3db01f40c 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/FileToStream.java
@@ -21,8 +21,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowStreamWriter;
 
 import java.io.File;
 import java.io.FileInputStream;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java 
b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
index d2b35e65a..666f1ddea 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/Integration.java
@@ -22,11 +22,11 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowBlock;
-import org.apache.arrow.vector.file.ArrowFileReader;
-import org.apache.arrow.vector.file.ArrowFileWriter;
-import org.apache.arrow.vector.file.json.JsonFileReader;
-import org.apache.arrow.vector.file.json.JsonFileWriter;
+import org.apache.arrow.vector.ipc.message.ArrowBlock;
+import org.apache.arrow.vector.ipc.ArrowFileReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.JsonFileReader;
+import org.apache.arrow.vector.ipc.JsonFileWriter;
 import org.apache.arrow.vector.types.pojo.DictionaryEncoding;
 import org.apache.arrow.vector.types.pojo.Field;
 import org.apache.arrow.vector.types.pojo.Schema;
diff --git a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java 
b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
index ef1a11f6b..42d336af9 100644
--- a/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
+++ b/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java
@@ -21,8 +21,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.file.ArrowFileWriter;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.ArrowFileWriter;
+import org.apache.arrow.vector.ipc.ArrowStreamReader;
 
 import java.io.File;
 import 

[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263122#comment-16263122
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346440612
 
 
   Reviewing the past comments, since these classes are generally internal, I 
think it's fine. master is broken right now (ARROW-1845) so I will merge this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263033#comment-16263033
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446
 
 
   I do not have a strong feeling either, I think `vector.ipc.message` 
subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` 
subnamespace? @BryanCutler what do you think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263034#comment-16263034
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346431446
 
 
   I do not have a strong feeling either, I think `vector.ipc.message` 
subnamespace are fine. Although maybe we can move `ArrowMagic` to `message` 
subnamespace? Sorry for the oversight. @BryanCutler what do you think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263020#comment-16263020
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346429291
 
 
   Squashed and rebased so we can get a passing build. While we are waiting, do 
we also want the `vector.ipc.message` subnamespace? Do not have a strong feeling


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262925#comment-16262925
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346415200
 
 
   LGTM. +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262873#comment-16262873
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152614571
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   Sure, I'm fine with this.  I'll change it now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261934#comment-16261934
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346235425
 
 
   This looks good to me. Once the package name hierarchy I think this should 
be good to go. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261747#comment-16261747
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152439921
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   I prefer `ipc.ArrowStreamReader` to `ipc.stream.ArrowStreamReader`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261735#comment-16261735
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152439165
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   @icexelloss do you have an opinion on this? Would be good to get this patch 
in soon to facilitate testing


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261521#comment-16261521
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152356580
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   Do you think the same for file and json readers, e.g. `ipc.ArrowFileReader`? 
 I made these subpackages because there were some supporting files specific to 
just the file reader, so they could be grouped together.  But I'm ok either 
way, @icexelloss brought this up here 
https://github.com/apache/arrow/pull/1259#issuecomment-340562836


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261508#comment-16261508
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152406142
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   These classes are all quite similar (the file format is very nearly the 
stream format, plus a file footer and magic numbers at start and end), I think 
it would make sense to keep them in a flat package namespace (but I'm not a 
Java expert)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261192#comment-16261192
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-346112886
 
 
   @siddharthteotia what ever is easier for this, but I would like to hear that 
I didn't break anything on your side :)  It's pretty easy to rebase this, so no 
need to rush


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259718#comment-16259718
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152087227
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   I would make this `ipc.ArrowStreamReader` but not `ipc.stream`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259717#comment-16259717
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r152087227
 
 

 ##
 File path: java/tools/src/main/java/org/apache/arrow/tools/EchoServer.java
 ##
 @@ -23,8 +23,8 @@
 import org.apache.arrow.memory.BufferAllocator;
 import org.apache.arrow.memory.RootAllocator;
 import org.apache.arrow.vector.VectorSchemaRoot;
-import org.apache.arrow.vector.stream.ArrowStreamReader;
-import org.apache.arrow.vector.stream.ArrowStreamWriter;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamReader;
+import org.apache.arrow.vector.ipc.stream.ArrowStreamWriter;
 
 Review comment:
   I would either make this `ipc.ArrowStreamReader` but not `ipc.stream`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257849#comment-16257849
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-345407725
 
 
   @siddharthteotia is this something you would like to run with the Dremio 
suite of tests before merging?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240897#comment-16240897
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-342286410
 
 
   @siddharthteotia that's fair enough, I don't want to complicate the 
refactoring.  I mostly just want to make sure that these changes don't make 
things harder to merge the java-vector-refactor branch into master.  I can try 
that out locally and report back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239462#comment-16239462
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

siddharthteotia commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-341959674
 
 
   @BryanCutler, are you suggesting to cherry pick your changes in refactor 
branch and revert commit in case things don't look good?
   
   I am not entirely sure what's the best option here but I believe that adding 
orthogonal set of changes to java-vector-refactor branch at this point may not 
be a good idea. However, I don't want to block other work. So feel free to 
proceed based on your best judgement.
   
   Note that there are currently two patches in that branch. While making 
changes in Dremio and debugging test failures, I had to go back and make some 
changes in vector code (minor only, no redesign). Currently those additional 
changes are in Dremio's fork (as I wanted to make quick progress) and I will 
put a PR against java-vector-refactor branch for the third patch very soon -- 
better to do at last when testing with Dremio completes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234560#comment-16234560
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

siddharthteotia commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-341202073
 
 
   Yes I am concerned that this will make patches in java-vector-refactor 
branch hard to merge into master, Secondly, the nature of changes suggest that 
we should be testing this with Dremio as well -- I would have loved to offer 
help but I am in the process of moving Dremio to new code in 
java-vector-refactor branch. 
   
   I would prefer to have these changes merged after java-vector-refactor 
changes are merged into master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234543#comment-16234543
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-341199634
 
 
   One thing I am not sure is if this patch will make java-refactor-branch hard 
to merge - cc @siddharthteotia for comment.
   
   Maybe we should keep all refactor changes in java-refactor-branch to make it 
easier to merge? Not sure though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234506#comment-16234506
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148339956
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema 
schema) throws IOException
   /**
* Deserializes a schema object. Format is from serialize().
*
-   * @param in the channel to deserialize from
+   * @param reader the reader interface to deserialize from
* @return the deserialized object
* @throws IOException if something went wrong
*/
-  public static Schema deserializeSchema(ReadChannel in) throws IOException {
-Message message = deserializeMessage(in);
+  public static Schema deserializeSchema(MessageReader reader) throws 
IOException {
 
 Review comment:
   Ok. Agree this can be a follow upl


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234505#comment-16234505
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148339853
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch 
deserializeDictionaryBatch(ReadChannel in,
 return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch);
   }
 
-  public static ArrowMessage deserializeMessageBatch(ReadChannel in, 
BufferAllocator alloc) throws IOException {
-Message message = deserializeMessage(in);
+  public static ArrowMessage deserializeMessageBatch(MessageReader reader, 
BufferAllocator alloc) throws IOException {
 
 Review comment:
   Ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234499#comment-16234499
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148338482
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch 
deserializeDictionaryBatch(ReadChannel in,
 return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch);
   }
 
-  public static ArrowMessage deserializeMessageBatch(ReadChannel in, 
BufferAllocator alloc) throws IOException {
-Message message = deserializeMessage(in);
+  public static ArrowMessage deserializeMessageBatch(MessageReader reader, 
BufferAllocator alloc) throws IOException {
 
 Review comment:
   Yeah, I think it's ok as is but this seems to be used only in a test.  How 
about we do a followup PR to refine these functions and we can discuss there?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234494#comment-16234494
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148337707
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema 
schema) throws IOException
   /**
* Deserializes a schema object. Format is from serialize().
*
-   * @param in the channel to deserialize from
+   * @param reader the reader interface to deserialize from
* @return the deserialized object
* @throws IOException if something went wrong
*/
-  public static Schema deserializeSchema(ReadChannel in) throws IOException {
-Message message = deserializeMessage(in);
+  public static Schema deserializeSchema(MessageReader reader) throws 
IOException {
 
 Review comment:
   I think it's ok to include reading the message as part of deserialization 
and some messages also require to read another chunk after the message.  I do 
think the behavior of these functions could be made to be more consistent, but 
we should probably do that as a followup.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234429#comment-16234429
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148328047
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageReader.java
 ##
 @@ -0,0 +1,37 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+
+import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.flatbuf.Message;
+import org.apache.arrow.memory.BufferAllocator;
+
+import java.io.IOException;
+
+public interface MessageReader {
+
+  Message readNextMessage() throws IOException;
+
+  ArrowBuf readMessageBody(Message message, BufferAllocator allocator) throws 
IOException;
 
 Review comment:
   Yeah, I meant to say that I still need to go through these changes and make 
sure everything is documented properly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227286#comment-16227286
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148088136
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch 
deserializeDictionaryBatch(ReadChannel in,
 return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch);
   }
 
-  public static ArrowMessage deserializeMessageBatch(ReadChannel in, 
BufferAllocator alloc) throws IOException {
-Message message = deserializeMessage(in);
+  public static ArrowMessage deserializeMessageBatch(MessageReader reader, 
BufferAllocator alloc) throws IOException {
 
 Review comment:
   this method won't read any generic message, it only works with RecordBatches 
or DictionaryBatches, hence the name...
   in the streaming format the first message after the schema could be either a 
record batch or a dictionary batch, this method is to handle either case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227258#comment-16227258
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148083371
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -377,8 +371,8 @@ public static ArrowDictionaryBatch 
deserializeDictionaryBatch(ReadChannel in,
 return new ArrowDictionaryBatch(dictionaryBatchFB.id(), recordBatch);
   }
 
-  public static ArrowMessage deserializeMessageBatch(ReadChannel in, 
BufferAllocator alloc) throws IOException {
-Message message = deserializeMessage(in);
+  public static ArrowMessage deserializeMessageBatch(MessageReader reader, 
BufferAllocator alloc) throws IOException {
 
 Review comment:
   The word "Batch" in the function name is a bit unintuitive. I kind of feel 
"Message" is a better term than "MessageBatch".
   
   Should we maybe rename this to `deserializeMessage`? 
   
   Also, this message doesn't seem to exclude schema message explicitly. Which 
also feels a bit weird.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227234#comment-16227234
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148080722
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java
 ##
 @@ -102,12 +96,12 @@ public static long serialize(WriteChannel out, Schema 
schema) throws IOException
   /**
* Deserializes a schema object. Format is from serialize().
*
-   * @param in the channel to deserialize from
+   * @param reader the reader interface to deserialize from
* @return the deserialized object
* @throws IOException if something went wrong
*/
-  public static Schema deserializeSchema(ReadChannel in) throws IOException {
-Message message = deserializeMessage(in);
+  public static Schema deserializeSchema(MessageReader reader) throws 
IOException {
 
 Review comment:
   This method seems to closer to `read schema` rather than `deserialize schema`
   
   ```
   public static Schema deserializeSchema(Message message)
   ```
   seem to make more sense to me
   
   Maybe this method can be made into:
   ```
   public static Schema readSchema(MessageReader reader) {
   
   Message message = reader.readNextMessage();
   return deserializeSchema(message);
   }
   ```
   ?
   
   @BryanCutler what do you think
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227221#comment-16227221
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148079120
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageReader.java
 ##
 @@ -0,0 +1,37 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+
+import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.flatbuf.Message;
+import org.apache.arrow.memory.BufferAllocator;
+
+import java.io.IOException;
+
+public interface MessageReader {
+
+  Message readNextMessage() throws IOException;
+
+  ArrowBuf readMessageBody(Message message, BufferAllocator allocator) throws 
IOException;
 
 Review comment:
   Maybe add a bit doc of what these methods are supposed to do? It's not very 
clear how to use `readNextMessage` and `readMessageBody`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227217#comment-16227217
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148078355
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java
 ##
 @@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+
+import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.flatbuf.Message;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.ReadChannel;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+public class MessageChannelReader implements MessageReader {
+
+  private ReadChannel in;
+
+  public MessageChannelReader(ReadChannel in) {
+this.in = in;
+  }
+
+  public Message readNextMessage() throws IOException {
 
 Review comment:
   Add override?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227214#comment-16227214
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148078259
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java
 ##
 @@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+
+import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.flatbuf.Message;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.ReadChannel;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+public class MessageChannelReader implements MessageReader {
+
+  private ReadChannel in;
+
+  public MessageChannelReader(ReadChannel in) {
 
 Review comment:
   add override?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227216#comment-16227216
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r148078259
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java
 ##
 @@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector.ipc.message;
+
+
+import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.flatbuf.Message;
+import org.apache.arrow.memory.BufferAllocator;
+import org.apache.arrow.vector.ipc.ReadChannel;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+public class MessageChannelReader implements MessageReader {
+
+  private ReadChannel in;
+
+  public MessageChannelReader(ReadChannel in) {
 
 Review comment:
   add override?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226999#comment-16226999
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340803970
 
 
   > I sort of prefer having separate packages for the different 
readers/writers. There are some supporting files that are specific to certain 
formats, like ArrowMagic and InvalidArrowFileException, and I like pushing it 
down to the feature that uses them. I think users will be more likely to import 
reader/writer from 1 format for a particular use too. I'm not tied to this 
though, we can simplify if that's the consensus.
   
   
> imo i like the current package layout with file, stream, json, message.
   
   Sounds good to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226731#comment-16226731
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

elahrvivaz commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340753115
 
 
   imo i like the current package layout with file, stream, json, message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225879#comment-16225879
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340606975
 
 
   Thanks @elahrvivaz, @icexelloss and @wesm !
   
   >What do people feel about having less sub namespaces?
   
   I sort of prefer having separate packages for the different readers/writers. 
 There are some supporting files that are specific to certain formats, like 
`ArrowMagic` and `InvalidArrowFileException`, and I like pushing it down to the 
feature that uses them.  I think users will be more likely to import 
reader/writer from 1 format for a particular use too.  I'm not tied to this 
though, we can simplify if that's the consensus. 
   
   >Also maybe JsonFileReader -> ArrowJsonReader for more consistent naming?
   
   +1 for me on renaming this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225650#comment-16225650
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340565857
 
 
   Backward compatibility wise, I think we should probably change this along 
with vector changes in one arrow release?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225640#comment-16225640
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340563276
 
 
   Also maybe `JsonFileReader` -> `ArrowJsonReader` for more consistent naming?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225638#comment-16225638
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340562836
 
 
   @BryanCutler This looks great! What do people feel about having less sub 
namespaces?
   
   Original, 
   ```
   o.a.a.vector.ipc
 file
 ArrowFileReader
 ArrowFileWriter
 ArrowMagic
 stream
 ArrowStreamReader
 ArrowStreamWriter
 json
 JsonFileReader
 JsonFileWriter
 message
 ArrowBlock
 ArrowFooter
 ArrowMessage
 ArrowRecordBatch
 ArrowDictionaryBatch
 FBSerializable
 FBSerializables  
 MessageSerializer
 ArrowReader
 ArrowWriter
 ReadChannel
 WriteChannel
   ```
   How do people feel about:
   ```
   o.a.a.vector.ipc
 message
 ArrowBlock
 ArrowFooter
 ArrowMessage
 ArrowRecordBatch
 ArrowDictionaryBatch
 FBSerializable
 FBSerializables  
 MessageSerializer
 ArrowReader
 ArrowWriter
 ArrowFileReader
 ArrowFileWriter
 ArrowMagic
 ArrowStreamReader
 ArrowStreamWriter
 ReadChannel
 WriteChannel
 JsonFileReader
 JsonFileWriter
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225639#comment-16225639
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340562836
 
 
   @BryanCutler This looks great! What do people feel about having less sub 
namespaces?
   
   Original, 
   ```
   o.a.a.vector.ipc
 file
 ArrowFileReader
 ArrowFileWriter
 ArrowMagic
 stream
 ArrowStreamReader
 ArrowStreamWriter
 json
 JsonFileReader
 JsonFileWriter
 message
 ArrowBlock
 ArrowFooter
 ArrowMessage
 ArrowRecordBatch
 ArrowDictionaryBatch
 FBSerializable
 FBSerializables  
 MessageSerializer
 ArrowReader
 ArrowWriter
 ReadChannel
 WriteChannel
   ```
   Less sub namespaces:
   ```
   o.a.a.vector.ipc
 message
 ArrowBlock
 ArrowFooter
 ArrowMessage
 ArrowRecordBatch
 ArrowDictionaryBatch
 FBSerializable
 FBSerializables  
 MessageSerializer
 ArrowReader
 ArrowWriter
 ArrowFileReader
 ArrowFileWriter
 ArrowMagic
 ArrowStreamReader
 ArrowStreamWriter
 ReadChannel
 WriteChannel
 JsonFileReader
 JsonFileWriter
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225621#comment-16225621
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r147812345
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/file/ArrowFileWriter.java
 ##
 @@ -16,14 +16,18 @@
  * limitations under the License.
  */
 
-package org.apache.arrow.vector.file;
+package org.apache.arrow.vector.ipc.file;
 
 Review comment:
   What do you feel about get rid of "file" and "stream" sub namespace, i.e.
   
   ```
   org.apache.arrow.vector.ipc.ArrowFileWriter
   ```
   ```
   org.apache.arrow.vector.ipc.ArrowStreamWriter
   ```
   I think these two namespaces `file` and `stream` are not very complicated, 
they can probably be combined
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225608#comment-16225608
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340558357
 
 
   > I'm not sure, all of the current messages are geared towards vectors so it 
makes sense to keep it there. Are you thinking of possible messages in the 
future that might not be vector related?
   I think this is fine for now. 
   
   Longer term, I kind of think we can improve the current package hierarchy 
where all API is under the name space `org.apache.arrow.vector`. A hierarchy 
similar to C++ might make more sense - `o.a.a.vector` `o.a.a.ipc` and etc. But 
no need to do it here I think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225531#comment-16225531
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340546565
 
 
   > having ipc should be a top level package rather than a subpackage under 
vector, i.e. org.apache.arrow.ipc
   
   I'm not sure, all of the current messages are geared towards vectors so it 
makes sense to keep it there.  Are you thinking of possible messages in the 
future that might not be vector related?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225530#comment-16225530
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r147799085
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java
 ##
 @@ -216,9 +171,32 @@ private void initialize() throws IOException {
 this.root = new VectorSchemaRoot(schema, vectors, 0);
 this.loader = new VectorLoader(root);
 this.dictionaries = Collections.unmodifiableMap(dictionaries);
+
+// Read and load all dictionaries from schema
+for (int i = 0; i < dictionaries.size(); i++) {
 
 Review comment:
   yeah, an overloaded method would be fine. I agree that having to load a 
batch before reading dictionaries is a bit confusing for the general use case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225520#comment-16225520
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

BryanCutler commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r147797804
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java
 ##
 @@ -216,9 +171,32 @@ private void initialize() throws IOException {
 this.root = new VectorSchemaRoot(schema, vectors, 0);
 this.loader = new VectorLoader(root);
 this.dictionaries = Collections.unmodifiableMap(dictionaries);
+
+// Read and load all dictionaries from schema
+for (int i = 0; i < dictionaries.size(); i++) {
 
 Review comment:
   Yeah, we could still do that.  I think it just comes down to either reading 
the dictionaries after the schema, or reading them before the first data batch. 
 I thought it made a little more sense to read them with the schema, otherwise 
the user could create the reader, load the schema and try to decode it but fail.
   
   Would it work for you to maybe overload `ArrowReader.readSchema` which will 
be able to return the original schema before loading the dictionaries?  
Similarly, if using the stream format, you could make a subclass of 
`MessageReader` (introduced here) and react after reading a schema message. If 
not, I'm ok with reading them before data batches and documenting for the user 
that you can't decode until batches are read.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225348#comment-16225348
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340517275
 
 
   At the high level, @BryanCutler what do you feel about having  `ipc` be a 
top level package rather than a subpackage under `vector`, i.e. 
`org.apache.arrow.ipc`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225347#comment-16225347
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

icexelloss commented on issue #1259: ARROW-1047: [Java] Add Generic Reader 
Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340517275
 
 
   At the high level, @BryanCutler what do you feel about having  `ipc` should 
be a top level package rather than a subpackage under `vector`, i.e. 
`org.apache.arrow.ipc`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224883#comment-16224883
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

elahrvivaz commented on a change in pull request #1259: ARROW-1047: [Java] Add 
Generic Reader Interface for Stream Format
URL: https://github.com/apache/arrow/pull/1259#discussion_r147695368
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowReader.java
 ##
 @@ -216,9 +171,32 @@ private void initialize() throws IOException {
 this.root = new VectorSchemaRoot(schema, vectors, 0);
 this.loader = new VectorLoader(root);
 this.dictionaries = Collections.unmodifiableMap(dictionaries);
+
+// Read and load all dictionaries from schema
+for (int i = 0; i < dictionaries.size(); i++) {
 
 Review comment:
   sometimes it's useful to be able to just read the schema out of a message, 
without loading up any dictionaries or record batches. is there a way to 
preserve that functionality somehow?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-10-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224275#comment-16224275
 ] 

ASF GitHub Bot commented on ARROW-1047:
---

wesm commented on issue #1259: ARROW-1047: [Java] Add Generic Reader Interface 
for Stream Format
URL: https://github.com/apache/arrow/pull/1259#issuecomment-340318241
 
 
   @BryanCutler at a high level this sounds great to me. cc @nongli also to 
take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>  Labels: pull-request-available
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1047) [Java] Add generalized stream writer and reader interfaces that are decoupled from IO / message framing

2017-05-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014648#comment-16014648
 ] 

Wes McKinney commented on ARROW-1047:
-

The benefits of this work is that stream readers and writers would not need to 
know about the underlying transport (whether the messaging are being written 
directly to a byte channel, or placed in a queue to be sent asynchronously 
through some RPC protocol). 

> [Java] Add generalized stream writer and reader interfaces that are decoupled 
> from IO / message framing
> ---
>
> Key: ARROW-1047
> URL: https://issues.apache.org/jira/browse/ARROW-1047
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java - Vectors
>Reporter: Wes McKinney
>
> cc [~julienledem] [~elahrvivaz] [~nongli]
> The ArrowWriter 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/ArrowWriter.java
>  accepts a WriteableByteChannel where the stream is written
> It would be useful to be able to support other kinds of message framing and 
> transport, like GRPC or HTTP. So rather than writing a complete Arrow stream 
> as a single contiguous byte stream, the component messages (schema, 
> dictionaries, and record batches) would be framed as separate messages in the 
> underlying protocol. 
> So if we were using ProtocolBuffers and gRPC as the underlying transport for 
> the stream, we could encapsulate components of an Arrow stream in objects 
> like:
> {code:language=protobuf}
> message ArrowMessagePB {
>   required bytes serialized_data;
> }
> {code}
> If the transport supports zero copy, that is obviously better than 
> serializing then parsing a protocol buffer.
> We should do this work in C++ as well to support more flexible stream 
> transport. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)