I *think* if you use a proto2 syntax message it actually will not perform 
this check as of today (only proto3 syntax file).

If that's not right, I unfortunately suspect the only way around it would 
be vendor the protobuf runtime into your codebase and comment out the check 
/ log if its bothering you.
On Friday, September 6, 2024 at 11:43:28 AM UTC-4 [email protected] wrote:

> Thank you for the detailed answer Em, I really appreciate it!
>
> Good to know the warning can probably be ignored for now. I've opted to do 
> the repeated option for now to avoid my logs being drowned in the 
> warnings... I take it there is no way to suppress warnings?
>
> Best,
> Florian
>
> On Thursday, September 5, 2024 at 5:19:00 PM UTC-4 Em Rauch wrote:
>
>> Using non-UTF8 data in a string field should be understood as incorrect, 
>> but realistically will work today as long as your messages are only used 
>> exactly by C++ Protobuf on the current release of protobuf and only ever 
>> with the binary wire format (not textproto or JSON encoding, etc).
>>
>> Today the malformed utf8 enforcement exists to different degrees in the 
>> different languages (and even depending on the syntax of the .proto file), 
>> but its not semantically intended that a `string` field should be used for 
>> non-utf8 data in any language. It should be assumed that a serialized 
>> message with a map<string, ?> where the keys are non-utf8 may start to 
>> parse-fail in some future release of Protobuf.
>>
>> Unfortunately bytes as a map key isn't allowed due to obscure technical 
>> concerns related to some non-C++ languages and the JSON representation, and 
>> we don't have an immediate plan to relax that.
>>
>> Realistically your options are:
>> - Keep doing what you're doing, only ever keep these messages in C++ and 
>> binary wire encoding, ignore the warnings, know that it might stop working 
>> if a future release of protobuf
>> - Make your key data be valid utf8 strings instead (eg, use a base64 
>> encoding of the digest instead of the raw digest bytes)
>> - Use repeated of a message with a key and value field instead of a map, 
>> and use your own struct as the in-memory representation when processing 
>> (move the data into/out of a STL map at the parse/serialization boundaries 
>> instead).
>>
>> Sorry there's not a more trivial fix available for this usecase!
>>
>> On Thursday, September 5, 2024 at 5:03:03 PM UTC-4 [email protected] 
>> wrote:
>>
>>> Hi,
>>>
>>> I've been using protobuf 3.5.1 in c++ and am using a message type with 
>>> the following map type: `map<string, MyObject> txns = 1`
>>>
>>> It is my understanding that `string` and `bytes` are the same in proto 
>>> c++; for maps however one can only use `string` as keys. I'm using the key 
>>> field to send around transaction digests which are byte strings consisting 
>>> of cryptographic hashes. As far as I can tell, it makes no difference 
>>> whether I use strings/bytes (the decoding works), yet I keep getting the 
>>> error:
>>>  
>>>  `String field 'pequinstore.proto.MergedSnapshot.MergedTxnsEntry.key' 
>>> contains invalid UTF-8 data when serializing a protocol buffer. Use the 
>>> 'bytes' type if you intend to send raw bytes.`
>>>
>>> I understand the error is complaining about my digests possibly not 
>>> being UTF-8, but I'm unsure if I actually need to be concerned about it; I 
>>> have not noticed any problems with parsing. Is there a way to suppress this 
>>> error?
>>>
>>> Or, if this is a serious error that could lead to non-deterministic 
>>> behavior, do you have a suggested workaround? There is a lot of existing 
>>> code that uses the map structure akin to an STL map, so I'd like to avoid 
>>> re-factoring the protobuf into a repeated field if possible. 
>>>
>>> Thanks,
>>> Florian
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/553698bd-9410-42fa-be51-989ba0e1a146n%40googlegroups.com.

Reply via email to