On Fri, Jan 15, 2016 at 4:37 PM, Austin Schuh <aus...@peloton-tech.com>
wrote:

>
>
> On Fri, Jan 15, 2016 at 1:30 PM Feng Xiao <xiaof...@google.com> wrote:
>
>> On Fri, Jan 15, 2016 at 11:50 AM, Austin Schuh <aus...@peloton-tech.com>
>> wrote:
>>
>>> On Fri, Jan 15, 2016 at 11:32 AM Feng Xiao <xiaof...@google.com> wrote:
>>>
>>>> On Thu, Jan 14, 2016 at 6:06 PM, Austin Schuh <aus...@peloton-tech.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've got an application where I can't allocate memory while using
>>>>> protobufs.  Arenas have been awesome for doing that.  I'm able to allocate
>>>>> a big block of memory at startup time or stack allocate memory for the
>>>>> arena, and then use that for allocating protobufs.  Thanks!
>>>>>
>>>>> I'd like to be able to allocate strings in the arena.  I'm willing to
>>>>> do the implementation, and wouldn't mind up-streaming if my implementation
>>>>> is complete enough and there is interest.  It looks like I should start by
>>>>> implementing ctype=STRING_PIECE and then allocate memory in the arena to
>>>>> back it.  The class in //src/google/protobuf:arenastring.h looks like the
>>>>> place to do all the operations.  It looks like I need to modify the
>>>>> interface to provide setters and getters to support STRING_PIECE there.
>>>>>
>>>>> Is that the right place to start?  Is there any more guidance that you
>>>>> can give me?
>>>>>
>>>> Hi Austin,
>>>>
>>>> Thanks for contacting us and offering help!
>>>>
>>>> You are looking at the right direction. We actually already opensourced
>>>> the StringPiece implementation not very long ago:
>>>>
>>>> https://github.com/google/protobuf/blob/master/src/google/protobuf/stubs/stringpiece.h
>>>>
>>>> It's intended to be used to implement "ctype = STRING_PIECE" for string
>>>> fields and since it's merely a <const char*, size_t> pair, it can be
>>>> directed at the buffer in the arena. Such features are implemented inside
>>>> Google but unfortunately it's not opensourced due to dependency issues. We
>>>> plan to get them out eventually but hasn't have enough time to work on it.
>>>> Since we already have an internal version of it, we probably won't be able
>>>> to accept your contributions. I can't give a concrete timeline about when
>>>> we will get our implementation opensourced also. Sorry for that...
>>>>
>>>> If you need this soon, I suggest you try to implement it as simple as
>>>> possible. Better to only support lite runtime with arena enabled. Some
>>>> changes you want to make:
>>>> 1. Make ArenaStringPtr work with StringPiece, or introduce an
>>>> ArenaStringPiecePtr which might be easier to implement.
>>>> 2. Update protocol compiler to use ArenaStringPtr/ArenaStringPiecePtr
>>>> to store ctype=STRING_PIECE fields and expose a StringPiece API:
>>>> // proto
>>>> message Foo {
>>>>   string bar = 1 [ctype = STRING_PIECE];
>>>> }
>>>> // generated C++ code
>>>> message Foo {
>>>>  public:
>>>>   StringPiece bar() const;
>>>>   void set_bar(StringPiece value);  // Note that we need to do a deep
>>>> copy here because StringPiece  doesn't own the underlying data.
>>>>   void set_alias_bar(StringPiece value);  // Make the field point to
>>>> the StringPiece data directly. Caller must make sure the underlying data
>>>> outlives the Foo message.
>>>>
>>>>  private:
>>>>   ArenaStringPiecePtr bar_;
>>>> };
>>>>
>>>> Look at the string_field.cc implementation in the compiler directory
>>>> <https://github.com/google/protobuf/blob/master/src/google/protobuf/compiler/cpp/cpp_string_field.cc>
>>>> and you can create a string_piece_field.cc implementation based on that.
>>>> Most of the work will be done here, including not only the generated API
>>>> but also all the parsing/serialization/copy/constructor/destructor support.
>>>>
>>>> That's pretty all that needed to support StringPiece in lite-runtime +
>>>> arena. A lot more work will be needed to support other combinations
>>>> (lite-runtime + no arena, full-runtime + arena, full-runtime + non-arena),
>>>> but since you have a specific targeted platform and we will opensource the
>>>> StringPiece support eventually, it's probably not worthwhile to invest time
>>>> to support anything you don't actually need right now.
>>>>
>>>> Hope this helps.
>>>>
>>>> Regards,
>>>> Feng
>>>>
>>>
>>> Hi Feng,
>>>
>>> This is very helpful, thanks!  I'm happy to hear that you are going to
>>> open source the implementation eventually, and thankful for the suggestions
>>> so I can be API compatible where possible.
>>>
>>> With careful googling and knowing what I was looking for, I found a
>>> StringPiece implementation in re2 years ago :)
>>>
>>> When setting ctype = STRING_PIECE, would you remove/replace the void
>>> set_foo(const ::std::string &value) calls, or have add additional ones?
>>>
>> For StringPiece only set_foo(StringPiece str) is needed.
>>
>
> And then I'm assuming foo() is modified to return StringPiece?
>
Right.


>
>
>> Since ::std::string can be converted to a StringPiece pretty easily,
>>> leaving them there should be easy.
>>>
>>> One of my use cases is to take in chunks of data from a data source and
>>> put them together to make a string.  Ideally, I would be able to grow a
>>> string in constant time (assuming constant time chunks), but that probably
>>> isn't practical.  It looks like I should be able to instead allocate a
>>> StringPiece (or the data inside it) inside the arena when the pieces start
>>> coming in, and then hand ownership to it via the set_alias_bar() call above
>>> when the string finishes?
>>>
>> Yes.
>>
>>
>>> Is there a better way to do what I'm trying to do?
>>>
>> You may have already noticed that we have another ctype for string
>> fields: ctype = CORD. This Cord string type allows you to concatenate
>> strings more efficiently without reallocate buffers and can also let the
>> string fields share the underlying data buffer with the input data chunks.
>> Inside Google we rely heavily on this Cord type to avoid string/bytes
>> copies in parsing and serialization. It's in our opensource plan as well.
>>
>>
>>
>>>
>>> I'll need to support full-runtime + arena, but none of the other
>>> combinations.  I'll figure something out to make sure the reflection does
>>> something sane in my case (CHECK(false) might work for what I want to do,
>>> I'll have to try it and see).  The reflection can cheat in my case since I
>>> don't care about it not allocating.
>>>
>> I'm pretty sure with reflection, the proto descriptors will be allocated
>> on heap. Is that acceptable in your use case?
>>
>
> I'm not worried about using reflection in the non-allocation case.  For
> us, reflection is mostly useful for testing, ShortDebugString, and other
> places where the user is willing to pay a larger cost to work with the data
> dynamically.
>
Sounds reasonable. It might not be that hard also. Mostly just going
through all the "switch (ctype)" cases and adding the missing branches that
are stripped out in our opensource process.


> Thanks!
>   Austin
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to