On Fri, Jan 15, 2016 at 1:30 PM Feng Xiao <[email protected]> wrote:
> On Fri, Jan 15, 2016 at 11:50 AM, Austin Schuh <[email protected]> > wrote: > >> On Fri, Jan 15, 2016 at 11:32 AM Feng Xiao <[email protected]> wrote: >> >>> On Thu, Jan 14, 2016 at 6:06 PM, Austin Schuh <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I've got an application where I can't allocate memory while using >>>> protobufs. Arenas have been awesome for doing that. I'm able to allocate >>>> a big block of memory at startup time or stack allocate memory for the >>>> arena, and then use that for allocating protobufs. Thanks! >>>> >>>> I'd like to be able to allocate strings in the arena. I'm willing to >>>> do the implementation, and wouldn't mind up-streaming if my implementation >>>> is complete enough and there is interest. It looks like I should start by >>>> implementing ctype=STRING_PIECE and then allocate memory in the arena to >>>> back it. The class in //src/google/protobuf:arenastring.h looks like the >>>> place to do all the operations. It looks like I need to modify the >>>> interface to provide setters and getters to support STRING_PIECE there. >>>> >>>> Is that the right place to start? Is there any more guidance that you >>>> can give me? >>>> >>> Hi Austin, >>> >>> Thanks for contacting us and offering help! >>> >>> You are looking at the right direction. We actually already opensourced >>> the StringPiece implementation not very long ago: >>> >>> https://github.com/google/protobuf/blob/master/src/google/protobuf/stubs/stringpiece.h >>> >>> It's intended to be used to implement "ctype = STRING_PIECE" for string >>> fields and since it's merely a <const char*, size_t> pair, it can be >>> directed at the buffer in the arena. Such features are implemented inside >>> Google but unfortunately it's not opensourced due to dependency issues. We >>> plan to get them out eventually but hasn't have enough time to work on it. >>> Since we already have an internal version of it, we probably won't be able >>> to accept your contributions. I can't give a concrete timeline about when >>> we will get our implementation opensourced also. Sorry for that... >>> >>> If you need this soon, I suggest you try to implement it as simple as >>> possible. Better to only support lite runtime with arena enabled. Some >>> changes you want to make: >>> 1. Make ArenaStringPtr work with StringPiece, or introduce an >>> ArenaStringPiecePtr which might be easier to implement. >>> 2. Update protocol compiler to use ArenaStringPtr/ArenaStringPiecePtr to >>> store ctype=STRING_PIECE fields and expose a StringPiece API: >>> // proto >>> message Foo { >>> string bar = 1 [ctype = STRING_PIECE]; >>> } >>> // generated C++ code >>> message Foo { >>> public: >>> StringPiece bar() const; >>> void set_bar(StringPiece value); // Note that we need to do a deep >>> copy here because StringPiece doesn't own the underlying data. >>> void set_alias_bar(StringPiece value); // Make the field point to the >>> StringPiece data directly. Caller must make sure the underlying data >>> outlives the Foo message. >>> >>> private: >>> ArenaStringPiecePtr bar_; >>> }; >>> >>> Look at the string_field.cc implementation in the compiler directory >>> <https://github.com/google/protobuf/blob/master/src/google/protobuf/compiler/cpp/cpp_string_field.cc> >>> and you can create a string_piece_field.cc implementation based on that. >>> Most of the work will be done here, including not only the generated API >>> but also all the parsing/serialization/copy/constructor/destructor support. >>> >>> That's pretty all that needed to support StringPiece in lite-runtime + >>> arena. A lot more work will be needed to support other combinations >>> (lite-runtime + no arena, full-runtime + arena, full-runtime + non-arena), >>> but since you have a specific targeted platform and we will opensource the >>> StringPiece support eventually, it's probably not worthwhile to invest time >>> to support anything you don't actually need right now. >>> >>> Hope this helps. >>> >>> Regards, >>> Feng >>> >> >> Hi Feng, >> >> This is very helpful, thanks! I'm happy to hear that you are going to >> open source the implementation eventually, and thankful for the suggestions >> so I can be API compatible where possible. >> >> With careful googling and knowing what I was looking for, I found a >> StringPiece implementation in re2 years ago :) >> >> When setting ctype = STRING_PIECE, would you remove/replace the void >> set_foo(const ::std::string &value) calls, or have add additional ones? >> > For StringPiece only set_foo(StringPiece str) is needed. > And then I'm assuming foo() is modified to return StringPiece? > Since ::std::string can be converted to a StringPiece pretty easily, >> leaving them there should be easy. >> >> One of my use cases is to take in chunks of data from a data source and >> put them together to make a string. Ideally, I would be able to grow a >> string in constant time (assuming constant time chunks), but that probably >> isn't practical. It looks like I should be able to instead allocate a >> StringPiece (or the data inside it) inside the arena when the pieces start >> coming in, and then hand ownership to it via the set_alias_bar() call above >> when the string finishes? >> > Yes. > > >> Is there a better way to do what I'm trying to do? >> > You may have already noticed that we have another ctype for string fields: > ctype = CORD. This Cord string type allows you to concatenate strings more > efficiently without reallocate buffers and can also let the string fields > share the underlying data buffer with the input data chunks. Inside Google > we rely heavily on this Cord type to avoid string/bytes copies in parsing > and serialization. It's in our opensource plan as well. > > > >> >> I'll need to support full-runtime + arena, but none of the other >> combinations. I'll figure something out to make sure the reflection does >> something sane in my case (CHECK(false) might work for what I want to do, >> I'll have to try it and see). The reflection can cheat in my case since I >> don't care about it not allocating. >> > I'm pretty sure with reflection, the proto descriptors will be allocated > on heap. Is that acceptable in your use case? > I'm not worried about using reflection in the non-allocation case. For us, reflection is mostly useful for testing, ShortDebugString, and other places where the user is willing to pay a larger cost to work with the data dynamically. Thanks! Austin -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.
