[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-13 Thread Philipp Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240621#comment-15240621
 ] 

Philipp Moritz commented on ARROW-100:
--

You were right, I was using an old version of clang-tidy. After updating it, 
everything worked out nicely, the pull request should be complete now.

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-13 Thread Micah Kornfield (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238717#comment-15238717
 ] 

Micah Kornfield commented on ARROW-100:
---

hmm what version, of clang-tidy do you have installed?  I thought I fixed this 
issue (at least for clang-tidy-3.7) when I checked in the initial clang-tidy 
check.  It looks like it is following the path that the assertion should 
checking for, but it doesn't realize that FatalLogger never returns after 
destruction.  

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-13 Thread Philipp Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238688#comment-15238688
 ] 

Philipp Moritz commented on ARROW-100:
--

Just added the test and fixed a bug that the test discovered (I need to compute 
the actual offset, passing in 0 is not enough); I also addressed your comments 
(didn't change the 1<<16 because I don't want the test to depend on code that 
is supposed to be tested). Thanks a lot for your help!

The issues clang-tidy found are largely unrelated to what I have been doing, 
but there is one the was introduced by the code I wrote that I don't really 
understand:

https://gist.github.com/pcmoritz/0ec7421c60140eb47fccd770c2cd280d

Any chance you can look at it and help me figure out what is going on?

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Micah Kornfield (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238517#comment-15238517
 ] 

Micah Kornfield commented on ARROW-100:
---

Also, in the existing unit test there is an approximate size (1<<16) you could 
replace that as well.

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Philipp Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238481#comment-15238481
 ] 

Philipp Moritz commented on ARROW-100:
--

Thanks!

The implementation is here: https://github.com/pcmoritz/arrow/tree/rowbatchsize

I'd like to write a test, is there any way to see how many bytes have been 
written to the MemorySource? And is it enough to create a PR in git to 
contribute the code back?


> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Micah Kornfield (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238413#comment-15238413
 ] 

Micah Kornfield commented on ARROW-100:
---

I don't have much experience with flatbuffers but I think it should be pretty 
fast (the google benchmark library is part of the toolchain if you want to 
write a benchmark to prove this).  I can't think of a better way to estimate 
its size, and I think keeping the implementation simple and concise makes sense 
until we have numbers to prove that it is a performance bottleneck.   If it is 
a bottleneck, we could probably mitigate it by keeping around the serialized 
version for writing (I think the most common use-case for this method will be 
call GetRowBatchSize(), allocate the necessary shared memory memory, then push 
the buffers to the shared-memory).

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Philipp Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238398#comment-15238398
 ] 

Philipp Moritz commented on ARROW-100:
--

Thanks for your answers!

Looking into it a bit more, it should be as simple as implementing 
DataHeaderSize (by calling WriteDataHeader and getting the resulting buffer 
size) and adding TotalBytes to its result. Assembling the Flatbuffer header 
should be fast enough, or do you see a better way to estimate the size?

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Micah Kornfield (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238386#comment-15238386
 ] 

Micah Kornfield commented on ARROW-100:
---

Welcome Philip.

Your approach sounds correct to me.  Most of the necessary code is already in 
ipc/adapter.cc (see RowBatchWriter), I think it is just a matter of factoring 
it a little better.

I'm working on some unrelated changes  to the same file but I don't think we 
will conflict (making it possible to send arrays across).

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-100) [C++] Computing RowBatch size

2016-04-12 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238304#comment-15238304
 ] 

Kai Zheng commented on ARROW-100:
-

Sounds good Philipp. I refined the title to avoid confusion.

> [C++] Computing RowBatch size
> -
>
> Key: ARROW-100
> URL: https://issues.apache.org/jira/browse/ARROW-100
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Philipp Moritz
>
> Hi,
> thank you guys for this project, I'm really enjoying what I've seen so far!
> There is an unimplemented method for getting the total size of objects:
> int64_t GetRowBatchSize(const RowBatch* batch);
> Has somebody already started to implement it or thought about how to do it?
> It could be done by recursively adding up all the involved buffer sizes, 
> build the metadata and add its size. Let me know if you want me to create a 
> draft of the implementation.
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)