Re: [capnproto] Random access clarified

2023-09-05 Thread 'Kenton Varda' via Cap'n Proto
On Tue, Sep 5, 2023 at 6:08 AM Johannes Dröge  wrote:

> Great, thanks for the clear statement!
>
> I have some follow-up questions now:
>
> 1) How is the memory usage patter for the deserialization of such members
> when they are accessed (either from disk or in memory). I suppose that the
> accessed element must be converted to a hardware-specific representation in
> memory to be usable, right? I assume that such a copy will exist in memory
> when used but no other copies.
>

The Cap'n Proto wire encoding is documented on the web site:

https://capnproto.org/encoding.html

As you'll see, there is no need to translate to a "hardware-specific
representation", as the wire representation is already designed to be
agreeable to all modern hardware without translation. This is the core
design goal of Cap'n Proto serialization.

When reading a message in Cap'n Proto, the backing buffer is only accessed
on-demand when you call the getter method to get a field. There is no
preprocessing at all. When you call the getter method for a pointer field,
only the pointer itself is read, in order to construct a Reader object; the
destination data is not read until you call methods on the Reader object to
read it.


> 2) For embedded binary (aka Data) objects, I assume that no real
> deserialization is actually needed. Is there a way to read such objects in
> a stream-like fashion to avoid putting them into memory entirely?
>

When reading a byte array, you essentially get a pointer into the backing
buffer; none of the bytes are accessed until your code does the accessing.
If you are reading from an mmaped file, then the pages will only be loaded
into memory when you access them, and the kernel can automatically unload
pages later when it needs memory for something else. There's nothing
special you need to do to achieve "streaming" in this case.

If you are reading a message from the network, though, it is necessary for
the entire message to arrive in memory before you can begin accessing it.
To achieve "streaming" from the network, you need to design your
application to send chunks of the stream as separate RPC calls.


> 3) I assume that the python interface does work the same. Are you aware of
> any limitations?
>

The Python implementation wraps the C++ implementation, so should broadly
work the same, but not all APIs are exposed. I don't personally maintain
the Python code so I can't really answer detailed questions about what it
can or can't do, sorry.

-Kenton


>
> Thanks for your support, I really enjoy that piece of software und hope
> that I can also use it for RPC in the future!
> Kenton Varda schrieb am Montag, 4. September 2023 um 17:47:22 UTC+2:
>
>> Hi Johannes,
>>
>> Yes, it applies to list indexing.
>>
>> -Kenton
>>
>> On Mon, Sep 4, 2023 at 10:43 AM Johannes Dröge  wrote:
>>
>>> Hi there!
>>>
>>> The FAQ states *"*Random access*:* You can read just one field of a
>>> message without parsing the whole thing". *However, does that also
>>> apply to List indexing*? I have a flexible-length list of potentially
>>> large objects, and I need to access the nth list element from disk without
>>> having to hold other elements in memory.
>>>
>>> I started using capnp for internal serialization in a prototype, with a
>>> more dynamic approach to data types and data structures. For this, I'm
>>> mostly attracted by the fast implementation and random access option, which
>>> gives me the possibility to mmap data structures to lazy-load attributes
>>> from disk. I'm currently using the Python interface but I might switch to
>>> C++, Rust or go at a later stage.
>>>
>>> I will try to profile this with a toy example. Nevertheless, I'd be
>>> thankful for a theoretical consideration here!
>>>
>>> Cheers
>>> Johannes
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Cap'n Proto" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to capnproto+...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/capnproto/48035174-9253-48d3-ad3d-b3fe69d249a3n%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to capnproto+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/capnproto/4af62cf4-9d0d-4f42-a205-76d25d5520f4n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an 

Re: [capnproto] Random access clarified

2023-09-05 Thread Johannes Dröge
Great, thanks for the clear statement!

I have some follow-up questions now:

1) How is the memory usage patter for the deserialization of such members 
when they are accessed (either from disk or in memory). I suppose that the 
accessed element must be converted to a hardware-specific representation in 
memory to be usable, right? I assume that such a copy will exist in memory 
when used but no other copies.

2) For embedded binary (aka Data) objects, I assume that no real 
deserialization is actually needed. Is there a way to read such objects in 
a stream-like fashion to avoid putting them into memory entirely?

3) I assume that the python interface does work the same. Are you aware of 
any limitations?

Thanks for your support, I really enjoy that piece of software und hope 
that I can also use it for RPC in the future!
Kenton Varda schrieb am Montag, 4. September 2023 um 17:47:22 UTC+2:

> Hi Johannes,
>
> Yes, it applies to list indexing.
>
> -Kenton
>
> On Mon, Sep 4, 2023 at 10:43 AM Johannes Dröge  wrote:
>
>> Hi there!
>>
>> The FAQ states *"*Random access*:* You can read just one field of a 
>> message without parsing the whole thing". *However, does that also apply 
>> to List indexing*? I have a flexible-length list of potentially large 
>> objects, and I need to access the nth list element from disk without having 
>> to hold other elements in memory.
>>
>> I started using capnp for internal serialization in a prototype, with a 
>> more dynamic approach to data types and data structures. For this, I'm 
>> mostly attracted by the fast implementation and random access option, which 
>> gives me the possibility to mmap data structures to lazy-load attributes 
>> from disk. I'm currently using the Python interface but I might switch to 
>> C++, Rust or go at a later stage.
>>
>> I will try to profile this with a toy example. Nevertheless, I'd be 
>> thankful for a theoretical consideration here!
>>
>> Cheers
>> Johannes
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to capnproto+...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/capnproto/48035174-9253-48d3-ad3d-b3fe69d249a3n%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/4af62cf4-9d0d-4f42-a205-76d25d5520f4n%40googlegroups.com.