On Fri, Jan 15, 2016 at 1:30 PM Feng Xiao <[email protected]> wrote:

> On Fri, Jan 15, 2016 at 11:50 AM, Austin Schuh <[email protected]>
> wrote:
>
>> On Fri, Jan 15, 2016 at 11:32 AM Feng Xiao <[email protected]> wrote:
>>
>>> On Thu, Jan 14, 2016 at 6:06 PM, Austin Schuh <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've got an application where I can't allocate memory while using
>>>> protobufs.  Arenas have been awesome for doing that.  I'm able to allocate
>>>> a big block of memory at startup time or stack allocate memory for the
>>>> arena, and then use that for allocating protobufs.  Thanks!
>>>>
>>>> I'd like to be able to allocate strings in the arena.  I'm willing to
>>>> do the implementation, and wouldn't mind up-streaming if my implementation
>>>> is complete enough and there is interest.  It looks like I should start by
>>>> implementing ctype=STRING_PIECE and then allocate memory in the arena to
>>>> back it.  The class in //src/google/protobuf:arenastring.h looks like the
>>>> place to do all the operations.  It looks like I need to modify the
>>>> interface to provide setters and getters to support STRING_PIECE there.
>>>>
>>>> Is that the right place to start?  Is there any more guidance that you
>>>> can give me?
>>>>
>>> Hi Austin,
>>>
>>> Thanks for contacting us and offering help!
>>>
>>> You are looking at the right direction. We actually already opensourced
>>> the StringPiece implementation not very long ago:
>>>
>>> https://github.com/google/protobuf/blob/master/src/google/protobuf/stubs/stringpiece.h
>>>
>>> It's intended to be used to implement "ctype = STRING_PIECE" for string
>>> fields and since it's merely a <const char*, size_t> pair, it can be
>>> directed at the buffer in the arena. Such features are implemented inside
>>> Google but unfortunately it's not opensourced due to dependency issues. We
>>> plan to get them out eventually but hasn't have enough time to work on it.
>>> Since we already have an internal version of it, we probably won't be able
>>> to accept your contributions. I can't give a concrete timeline about when
>>> we will get our implementation opensourced also. Sorry for that...
>>>
>>> If you need this soon, I suggest you try to implement it as simple as
>>> possible. Better to only support lite runtime with arena enabled. Some
>>> changes you want to make:
>>> 1. Make ArenaStringPtr work with StringPiece, or introduce an
>>> ArenaStringPiecePtr which might be easier to implement.
>>> 2. Update protocol compiler to use ArenaStringPtr/ArenaStringPiecePtr to
>>> store ctype=STRING_PIECE fields and expose a StringPiece API:
>>> // proto
>>> message Foo {
>>>   string bar = 1 [ctype = STRING_PIECE];
>>> }
>>> // generated C++ code
>>> message Foo {
>>>  public:
>>>   StringPiece bar() const;
>>>   void set_bar(StringPiece value);  // Note that we need to do a deep
>>> copy here because StringPiece  doesn't own the underlying data.
>>>   void set_alias_bar(StringPiece value);  // Make the field point to the
>>> StringPiece data directly. Caller must make sure the underlying data
>>> outlives the Foo message.
>>>
>>>  private:
>>>   ArenaStringPiecePtr bar_;
>>> };
>>>
>>> Look at the string_field.cc implementation in the compiler directory
>>> <https://github.com/google/protobuf/blob/master/src/google/protobuf/compiler/cpp/cpp_string_field.cc>
>>> and you can create a string_piece_field.cc implementation based on that.
>>> Most of the work will be done here, including not only the generated API
>>> but also all the parsing/serialization/copy/constructor/destructor support.
>>>
>>> That's pretty all that needed to support StringPiece in lite-runtime +
>>> arena. A lot more work will be needed to support other combinations
>>> (lite-runtime + no arena, full-runtime + arena, full-runtime + non-arena),
>>> but since you have a specific targeted platform and we will opensource the
>>> StringPiece support eventually, it's probably not worthwhile to invest time
>>> to support anything you don't actually need right now.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Feng
>>>
>>
>> Hi Feng,
>>
>> This is very helpful, thanks!  I'm happy to hear that you are going to
>> open source the implementation eventually, and thankful for the suggestions
>> so I can be API compatible where possible.
>>
>> With careful googling and knowing what I was looking for, I found a
>> StringPiece implementation in re2 years ago :)
>>
>> When setting ctype = STRING_PIECE, would you remove/replace the void
>> set_foo(const ::std::string &value) calls, or have add additional ones?
>>
> For StringPiece only set_foo(StringPiece str) is needed.
>

And then I'm assuming foo() is modified to return StringPiece?


> Since ::std::string can be converted to a StringPiece pretty easily,
>> leaving them there should be easy.
>>
>> One of my use cases is to take in chunks of data from a data source and
>> put them together to make a string.  Ideally, I would be able to grow a
>> string in constant time (assuming constant time chunks), but that probably
>> isn't practical.  It looks like I should be able to instead allocate a
>> StringPiece (or the data inside it) inside the arena when the pieces start
>> coming in, and then hand ownership to it via the set_alias_bar() call above
>> when the string finishes?
>>
> Yes.
>
>
>> Is there a better way to do what I'm trying to do?
>>
> You may have already noticed that we have another ctype for string fields:
> ctype = CORD. This Cord string type allows you to concatenate strings more
> efficiently without reallocate buffers and can also let the string fields
> share the underlying data buffer with the input data chunks. Inside Google
> we rely heavily on this Cord type to avoid string/bytes copies in parsing
> and serialization. It's in our opensource plan as well.
>
>
>
>>
>> I'll need to support full-runtime + arena, but none of the other
>> combinations.  I'll figure something out to make sure the reflection does
>> something sane in my case (CHECK(false) might work for what I want to do,
>> I'll have to try it and see).  The reflection can cheat in my case since I
>> don't care about it not allocating.
>>
> I'm pretty sure with reflection, the proto descriptors will be allocated
> on heap. Is that acceptable in your use case?
>

I'm not worried about using reflection in the non-allocation case.  For us,
reflection is mostly useful for testing, ShortDebugString, and other places
where the user is willing to pay a larger cost to work with the data
dynamically.

Thanks!
  Austin

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to