[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-03-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790999#comment-16790999
 ] 

Wes McKinney commented on ARROW-2119:
-

The patch I put up is full of failures. I'm doubtful this can be resolved in 
time for 0.13

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-02-27 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779569#comment-16779569
 ] 

Wes McKinney commented on ARROW-2119:
-

This can be resolved by adding a zero-record-batch stream to the integration 
tests

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Priority: Major
> Fix For: 0.13.0
>
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch

2019-02-05 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761458#comment-16761458
 ] 

Wes McKinney commented on ARROW-2119:
-

I think this might be fixed. I added it to 0.13 to check C++ at least

> [C++][Java] Handle Arrow stream with zero record batch
> --
>
> Key: ARROW-2119
> URL: https://issues.apache.org/jira/browse/ARROW-2119
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java
>Reporter: Jingyuan Wang
>Priority: Major
> Fix For: 0.13.0
>
>
> It looks like currently many places of the code assume that there needs to be 
> at least one record batch for streaming format. Is zero-recordbatch not 
> supported by design?
> e.g. 
> [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45]
> {code:none}
>   public static void convert(InputStream in, OutputStream out) throws 
> IOException {
> BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
> try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) {
>   VectorSchemaRoot root = reader.getVectorSchemaRoot();
>   // load the first batch before instantiating the writer so that we have 
> any dictionaries
>   if (!reader.loadNextBatch()) {
> throw new IOException("Unable to read first record batch");
>   }
>   ...
> {code}
> Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an 
> exception originated from 
> [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:]
> {code:none}
> Status Table::FromRecordBatches(const 
> std::vector>& batches,
> std::shared_ptr* table) {
>   if (batches.size() == 0) {
> return Status::Invalid("Must pass at least one record batch");
>   }
>   ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)