[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch
[ https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790999#comment-16790999 ] Wes McKinney commented on ARROW-2119: - The patch I put up is full of failures. I'm doubtful this can be resolved in time for 0.13 > [C++][Java] Handle Arrow stream with zero record batch > -- > > Key: ARROW-2119 > URL: https://issues.apache.org/jira/browse/ARROW-2119 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Java >Reporter: Jingyuan Wang >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > It looks like currently many places of the code assume that there needs to be > at least one record batch for streaming format. Is zero-recordbatch not > supported by design? > e.g. > [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45] > {code:none} > public static void convert(InputStream in, OutputStream out) throws > IOException { > BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); > try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) { > VectorSchemaRoot root = reader.getVectorSchemaRoot(); > // load the first batch before instantiating the writer so that we have > any dictionaries > if (!reader.loadNextBatch()) { > throw new IOException("Unable to read first record batch"); > } > ... > {code} > Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an > exception originated from > [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:] > {code:none} > Status Table::FromRecordBatches(const > std::vector>& batches, > std::shared_ptr* table) { > if (batches.size() == 0) { > return Status::Invalid("Must pass at least one record batch"); > } > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch
[ https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779569#comment-16779569 ] Wes McKinney commented on ARROW-2119: - This can be resolved by adding a zero-record-batch stream to the integration tests > [C++][Java] Handle Arrow stream with zero record batch > -- > > Key: ARROW-2119 > URL: https://issues.apache.org/jira/browse/ARROW-2119 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Java >Reporter: Jingyuan Wang >Priority: Major > Fix For: 0.13.0 > > > It looks like currently many places of the code assume that there needs to be > at least one record batch for streaming format. Is zero-recordbatch not > supported by design? > e.g. > [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45] > {code:none} > public static void convert(InputStream in, OutputStream out) throws > IOException { > BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); > try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) { > VectorSchemaRoot root = reader.getVectorSchemaRoot(); > // load the first batch before instantiating the writer so that we have > any dictionaries > if (!reader.loadNextBatch()) { > throw new IOException("Unable to read first record batch"); > } > ... > {code} > Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an > exception originated from > [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:] > {code:none} > Status Table::FromRecordBatches(const > std::vector>& batches, > std::shared_ptr* table) { > if (batches.size() == 0) { > return Status::Invalid("Must pass at least one record batch"); > } > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2119) [C++][Java] Handle Arrow stream with zero record batch
[ https://issues.apache.org/jira/browse/ARROW-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761458#comment-16761458 ] Wes McKinney commented on ARROW-2119: - I think this might be fixed. I added it to 0.13 to check C++ at least > [C++][Java] Handle Arrow stream with zero record batch > -- > > Key: ARROW-2119 > URL: https://issues.apache.org/jira/browse/ARROW-2119 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Java >Reporter: Jingyuan Wang >Priority: Major > Fix For: 0.13.0 > > > It looks like currently many places of the code assume that there needs to be > at least one record batch for streaming format. Is zero-recordbatch not > supported by design? > e.g. > [https://github.com/apache/arrow/blob/master/java/tools/src/main/java/org/apache/arrow/tools/StreamToFile.java#L45] > {code:none} > public static void convert(InputStream in, OutputStream out) throws > IOException { > BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE); > try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator)) { > VectorSchemaRoot root = reader.getVectorSchemaRoot(); > // load the first batch before instantiating the writer so that we have > any dictionaries > if (!reader.loadNextBatch()) { > throw new IOException("Unable to read first record batch"); > } > ... > {code} > Pyarrow-0.8.0 does not load 0-recordbatch stream either. It would throw an > exception originated from > [https://github.com/apache/arrow/blob/a95465b8ce7a32feeaae3e13d0a64102ffa590d9/cpp/src/arrow/table.cc#L309:] > {code:none} > Status Table::FromRecordBatches(const > std::vector>& batches, > std::shared_ptr* table) { > if (batches.size() == 0) { > return Status::Invalid("Must pass at least one record batch"); > } > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)