Re: [capnproto] Performance of iterating through List vs. raw Data

2022-09-21 Thread 'Kenton Varda' via Cap'n Proto
On Wed, Sep 21, 2022 at 9:42 AM Hui min  wrote:

> I want to ask a related question, and found out this thread.
>
> From what I understand in the discussion, if I have
>
> struct Vector3d {
> x @0 :Float64;
> y @1 :Float64;
> z @2 :Float64;
> }
>
> and then
>
> struct Point3Array {
> data @0 :List(Vector3d);
> }
>
> Will i get a perfectly packed array of Vector3d in memory, since it is
> already 8-byte aligned?
>

Yes, in practice, given this schema, the encoder will produce a perfectly
packed array of floats in little-endian byte order.

But note that if you *assume* this to be the case on the reading end, then
you are technically violating Cap'n Proto parsing rules. A Cap'n Proto
parser would normally be expected to accept data where `Vector3d` has been
extended with new fields (newer version of the schema), or where the
trailing fields aren't present (older version of the schema). In these
cases you might not get a perfectly packed array of triplets. You can, of
course, make a rule that these are not allowed for your specific
application and write your code to reject them. I'm just pointing out that
it's a possible difference from what Cap'n Proto's APIs would normally do.


> A follow up question, would there be any changes in the memory layout, if
> I change the definition of Vector3d to be:
>
> struct Vector3d {
> pt @0 :List(Float64)
> }
>
> and always insert 3 elements to Vector3d? I guess this won't work as
> expected, as the List() is inheriently a variable length entity.
>

As you guessed, this case is different. A `List` field is encoded as a
pointer to an external list which can have a different length for each
Vector3d.

-Kenton


>
>
> On Thursday, 15 August 2019 at 3:25:09 pm UTC+8 Philipp Wissmann wrote:
>
>> Hi Kenton
>>
>> Thanks a lot for this amazingly fast reply and the explanations.
>>
>> We might try the direct pointer approach and it's very useful to know
>> this possibility but for now I think having the data in a Data member seems
>> to be working.
>>
>>
>> Cheers!
>> Philipp
>>
>> On Tuesday, August 13, 2019 at 7:43:08 PM UTC+2, Kenton Varda wrote:
>>
>>> Hi Philipp,
>>>
>>> This is a bit of an unusual case, where I imagine you are working with
>>> bulk vector data forming a 3D mesh or some such, and in order to hand if
>>> off to the graphics card, you really need a direct pointer to the
>>> underlying data and you need it to be in a specific layout.
>>>
>>> Using Data is this case might make sense.
>>>
>>> Alternatively, here's one trick I thought of:
>>>
>>> Say your Vector3f is defined as:
>>>
>>> struct Vector3f { x @0 :Float32; y @1 :Float32; z @2 :Float32; }
>>>
>>> Note that Cap'n Proto pads every struct to an 8-byte boundary, so
>>> there's 4 bytes of padding at the end of this struct. If that doesn't work
>>> for you, then I think you have no choice but to use Data. But if the
>>> padding is OK, then here's a way you can get a direct pointer to the data:
>>>
>>> const kj::byte* getRawPointer(capnp::List::Reader list) {
>>>   if (list.size() == 0) {
>>> return nullptr;
>>>   } else {
>>> capnp::AnyStruct::Reader any(list[0]);
>>> KJ_REQUIRE(any.getPointerSection().size() == 0);
>>> KJ_REQUIRE(any.getDataSection().size() == 16);
>>> return any.getDataSection().begin();
>>>   }
>>> }
>>>
>>> Here, you're getting a direct pointer to the "data section" of the first
>>> struct in the list. Structs in a struct list are always stored
>>> contiguously, so you can extend this out to the size of the list. You do
>>> have to verify that the structs have the expected size, since technically
>>> struct sizes are allowed to change to support schema evolution (in this
>>> case, you'll never be able to add fields to Vector3f, except maybe a `w @3
>>> :Float32` which would use the remaining padding without changing the size).
>>> All structs in a struct list have the same size, so if the first struct
>>> looks good, then the whole list is good.
>>>
>>> Hope that helps.
>>>
>>> -Kenton
>>>
>>> On Tue, Aug 13, 2019 at 9:32 AM  wrote:
>>>
>> Hi

 We used Cap'n Proto to serialize data in a shared memory environment
 and defined some message types as structs containining List, i.e.

 struct Message{
points @0 List(Vector3f)
 };

 Where Vector3f is another struct containing either 3 floats or a
 List(Float32). However, extracting the Vector3f's from the List is not fast
 enough for our use cases as you basically need to use std::copy or
 std::transform on the List. We instead now replaced the definition by

 struct Message{
points @0 Data;
 };

 and just read and copy the array of bytes. Unfortunately, this
 basically gets rid of the nice schema language of Cap'n Proto. So, what's
 the most efficient way to define and read a contiguous  array of the same
 data type?

 Best,
 Philipp

 *This message is confidential and 

Re: [capnproto] Performance of iterating through List vs. raw Data

2022-09-21 Thread Hui min
Another comment, I would also agree probably using raw data as the type is 
more appropriate, with a metadata information to encode the endianess.

This would be similiar 
to http://docs.ros.org/en/melodic/api/sensor_msgs/html/msg/PointCloud2.html

On Thursday, 15 August 2019 at 3:25:09 pm UTC+8 Philipp Wissmann wrote:

> Hi Kenton
>
> Thanks a lot for this amazingly fast reply and the explanations. 
>
> We might try the direct pointer approach and it's very useful to know this 
> possibility but for now I think having the data in a Data member seems to 
> be working.
>
>
> Cheers!
> Philipp
>
> On Tuesday, August 13, 2019 at 7:43:08 PM UTC+2, Kenton Varda wrote:
>
>> Hi Philipp,
>>
>> This is a bit of an unusual case, where I imagine you are working with 
>> bulk vector data forming a 3D mesh or some such, and in order to hand if 
>> off to the graphics card, you really need a direct pointer to the 
>> underlying data and you need it to be in a specific layout.
>>
>> Using Data is this case might make sense.
>>
>> Alternatively, here's one trick I thought of:
>>
>> Say your Vector3f is defined as:
>>
>> struct Vector3f { x @0 :Float32; y @1 :Float32; z @2 :Float32; }
>>
>> Note that Cap'n Proto pads every struct to an 8-byte boundary, so there's 
>> 4 bytes of padding at the end of this struct. If that doesn't work for you, 
>> then I think you have no choice but to use Data. But if the padding is OK, 
>> then here's a way you can get a direct pointer to the data:
>>
>> const kj::byte* getRawPointer(capnp::List::Reader list) {
>>   if (list.size() == 0) {
>> return nullptr;
>>   } else {
>> capnp::AnyStruct::Reader any(list[0]);
>> KJ_REQUIRE(any.getPointerSection().size() == 0);
>> KJ_REQUIRE(any.getDataSection().size() == 16);
>> return any.getDataSection().begin();
>>   }
>> }
>>
>> Here, you're getting a direct pointer to the "data section" of the first 
>> struct in the list. Structs in a struct list are always stored 
>> contiguously, so you can extend this out to the size of the list. You do 
>> have to verify that the structs have the expected size, since technically 
>> struct sizes are allowed to change to support schema evolution (in this 
>> case, you'll never be able to add fields to Vector3f, except maybe a `w @3 
>> :Float32` which would use the remaining padding without changing the size). 
>> All structs in a struct list have the same size, so if the first struct 
>> looks good, then the whole list is good.
>>
>> Hope that helps.
>>
>> -Kenton
>>
>> On Tue, Aug 13, 2019 at 9:32 AM  wrote:
>>
> Hi
>>>
>>> We used Cap'n Proto to serialize data in a shared memory environment and 
>>> defined some message types as structs containining List, i.e.
>>>
>>> struct Message{
>>>points @0 List(Vector3f)
>>> };
>>>
>>> Where Vector3f is another struct containing either 3 floats or a 
>>> List(Float32). However, extracting the Vector3f's from the List is not fast 
>>> enough for our use cases as you basically need to use std::copy or 
>>> std::transform on the List. We instead now replaced the definition by 
>>>
>>> struct Message{
>>>points @0 Data;
>>> };
>>>
>>> and just read and copy the array of bytes. Unfortunately, this basically 
>>> gets rid of the nice schema language of Cap'n Proto. So, what's the most 
>>> efficient way to define and read a contiguous  array of the same data type?
>>>
>>> Best,
>>> Philipp
>>>
>>> *This message is confidential and only for the use of its addressee. 
>>> Email communications are not secure and therefore we do not accept 
>>> responsibility for the confidentiality or unaltered contents of this 
>>> message.* 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Cap'n Proto" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to capn...@googlegroups.com.
>>
>>
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/capnproto/6a847434-c48b-4b18-9252-737562301464%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/fd4b0e14-4282-47f1-812d-af4da6ad4dabn%40googlegroups.com.


Re: [capnproto] Performance of iterating through List vs. raw Data

2022-09-21 Thread Hui min
I want to ask a related question, and found out this thread.

>From what I understand in the discussion, if I have

struct Vector3d {
x @0 :Float64;
y @1 :Float64;
z @2 :Float64; 
}

and then

struct Point3Array {
data @0 :List(Vector3d);
}

Will i get a perfectly packed array of Vector3d in memory, since it is 
already 8-byte aligned?

A follow up question, would there be any changes in the memory layout, if I 
change the definition of Vector3d to be:

struct Vector3d {
pt @0 :List(Float64)
}

and always insert 3 elements to Vector3d? I guess this won't work as 
expected, as the List() is inheriently a variable length entity.


On Thursday, 15 August 2019 at 3:25:09 pm UTC+8 Philipp Wissmann wrote:

> Hi Kenton
>
> Thanks a lot for this amazingly fast reply and the explanations. 
>
> We might try the direct pointer approach and it's very useful to know this 
> possibility but for now I think having the data in a Data member seems to 
> be working.
>
>
> Cheers!
> Philipp
>
> On Tuesday, August 13, 2019 at 7:43:08 PM UTC+2, Kenton Varda wrote:
>
>> Hi Philipp,
>>
>> This is a bit of an unusual case, where I imagine you are working with 
>> bulk vector data forming a 3D mesh or some such, and in order to hand if 
>> off to the graphics card, you really need a direct pointer to the 
>> underlying data and you need it to be in a specific layout.
>>
>> Using Data is this case might make sense.
>>
>> Alternatively, here's one trick I thought of:
>>
>> Say your Vector3f is defined as:
>>
>> struct Vector3f { x @0 :Float32; y @1 :Float32; z @2 :Float32; }
>>
>> Note that Cap'n Proto pads every struct to an 8-byte boundary, so there's 
>> 4 bytes of padding at the end of this struct. If that doesn't work for you, 
>> then I think you have no choice but to use Data. But if the padding is OK, 
>> then here's a way you can get a direct pointer to the data:
>>
>> const kj::byte* getRawPointer(capnp::List::Reader list) {
>>   if (list.size() == 0) {
>> return nullptr;
>>   } else {
>> capnp::AnyStruct::Reader any(list[0]);
>> KJ_REQUIRE(any.getPointerSection().size() == 0);
>> KJ_REQUIRE(any.getDataSection().size() == 16);
>> return any.getDataSection().begin();
>>   }
>> }
>>
>> Here, you're getting a direct pointer to the "data section" of the first 
>> struct in the list. Structs in a struct list are always stored 
>> contiguously, so you can extend this out to the size of the list. You do 
>> have to verify that the structs have the expected size, since technically 
>> struct sizes are allowed to change to support schema evolution (in this 
>> case, you'll never be able to add fields to Vector3f, except maybe a `w @3 
>> :Float32` which would use the remaining padding without changing the size). 
>> All structs in a struct list have the same size, so if the first struct 
>> looks good, then the whole list is good.
>>
>> Hope that helps.
>>
>> -Kenton
>>
>> On Tue, Aug 13, 2019 at 9:32 AM  wrote:
>>
> Hi
>>>
>>> We used Cap'n Proto to serialize data in a shared memory environment and 
>>> defined some message types as structs containining List, i.e.
>>>
>>> struct Message{
>>>points @0 List(Vector3f)
>>> };
>>>
>>> Where Vector3f is another struct containing either 3 floats or a 
>>> List(Float32). However, extracting the Vector3f's from the List is not fast 
>>> enough for our use cases as you basically need to use std::copy or 
>>> std::transform on the List. We instead now replaced the definition by 
>>>
>>> struct Message{
>>>points @0 Data;
>>> };
>>>
>>> and just read and copy the array of bytes. Unfortunately, this basically 
>>> gets rid of the nice schema language of Cap'n Proto. So, what's the most 
>>> efficient way to define and read a contiguous  array of the same data type?
>>>
>>> Best,
>>> Philipp
>>>
>>> *This message is confidential and only for the use of its addressee. 
>>> Email communications are not secure and therefore we do not accept 
>>> responsibility for the confidentiality or unaltered contents of this 
>>> message.* 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Cap'n Proto" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to capn...@googlegroups.com.
>>
>>
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/capnproto/6a847434-c48b-4b18-9252-737562301464%40googlegroups.com
>>>  
>>> 
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit