[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240621#comment-15240621 ] Philipp Moritz commented on ARROW-100: -- You were right, I was using an old version of clang-tidy. After updating it, everything worked out nicely, the pull request should be complete now. > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238717#comment-15238717 ] Micah Kornfield commented on ARROW-100: --- hmm what version, of clang-tidy do you have installed? I thought I fixed this issue (at least for clang-tidy-3.7) when I checked in the initial clang-tidy check. It looks like it is following the path that the assertion should checking for, but it doesn't realize that FatalLogger never returns after destruction. > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238688#comment-15238688 ] Philipp Moritz commented on ARROW-100: -- Just added the test and fixed a bug that the test discovered (I need to compute the actual offset, passing in 0 is not enough); I also addressed your comments (didn't change the 1<<16 because I don't want the test to depend on code that is supposed to be tested). Thanks a lot for your help! The issues clang-tidy found are largely unrelated to what I have been doing, but there is one the was introduced by the code I wrote that I don't really understand: https://gist.github.com/pcmoritz/0ec7421c60140eb47fccd770c2cd280d Any chance you can look at it and help me figure out what is going on? > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238517#comment-15238517 ] Micah Kornfield commented on ARROW-100: --- Also, in the existing unit test there is an approximate size (1<<16) you could replace that as well. > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238481#comment-15238481 ] Philipp Moritz commented on ARROW-100: -- Thanks! The implementation is here: https://github.com/pcmoritz/arrow/tree/rowbatchsize I'd like to write a test, is there any way to see how many bytes have been written to the MemorySource? And is it enough to create a PR in git to contribute the code back? > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238413#comment-15238413 ] Micah Kornfield commented on ARROW-100: --- I don't have much experience with flatbuffers but I think it should be pretty fast (the google benchmark library is part of the toolchain if you want to write a benchmark to prove this). I can't think of a better way to estimate its size, and I think keeping the implementation simple and concise makes sense until we have numbers to prove that it is a performance bottleneck. If it is a bottleneck, we could probably mitigate it by keeping around the serialized version for writing (I think the most common use-case for this method will be call GetRowBatchSize(), allocate the necessary shared memory memory, then push the buffers to the shared-memory). > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238398#comment-15238398 ] Philipp Moritz commented on ARROW-100: -- Thanks for your answers! Looking into it a bit more, it should be as simple as implementing DataHeaderSize (by calling WriteDataHeader and getting the resulting buffer size) and adding TotalBytes to its result. Assembling the Flatbuffer header should be fast enough, or do you see a better way to estimate the size? > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238386#comment-15238386 ] Micah Kornfield commented on ARROW-100: --- Welcome Philip. Your approach sounds correct to me. Most of the necessary code is already in ipc/adapter.cc (see RowBatchWriter), I think it is just a matter of factoring it a little better. I'm working on some unrelated changes to the same file but I don't think we will conflict (making it possible to send arrays across). > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size
[ https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238304#comment-15238304 ] Kai Zheng commented on ARROW-100: - Sounds good Philipp. I refined the title to avoid confusion. > [C++] Computing RowBatch size > - > > Key: ARROW-100 > URL: https://issues.apache.org/jira/browse/ARROW-100 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Philipp Moritz > > Hi, > thank you guys for this project, I'm really enjoying what I've seen so far! > There is an unimplemented method for getting the total size of objects: > int64_t GetRowBatchSize(const RowBatch* batch); > Has somebody already started to implement it or thought about how to do it? > It could be done by recursively adding up all the involved buffer sizes, > build the metadata and add its size. Let me know if you want me to create a > draft of the implementation. > -- Philipp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)