[protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
Okay, well it's slightly more complicated. My C++ application needs to actually accept the technically invalid code points U+ and U+FFFE. Otherwise, I need my server application to know when invalid UTF-8 has happened. That's all fine. I have that all implemented. That's good. The problem is I want to exercise that behavior from my Python systest framework. The problem is the Python libs are trying to be too helpful. While I normally want them to do UTF-8 validation, I *don't* want them to during the systests, because I want to send bad UTF-8 to the server. Make sense? I'm trying to do bad things to make sure stuff still works in a systest environment. -JT On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote: If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
It looks like I figured out a solution, though I'm not sure this is the best way. I have: pbuf = MyProtoBuf() pbuf.string_field = # to make sure pbuf initialization stuff works (sets _has_string_field, etc) pbuf._value_string_field = bad utf8 f = pbuf.DESCRIPTOR.fields_by_number[pbuf.STRING_FIELD_NUMBER] f.type = f.TYPE_BYTES On Mon, May 17, 2010 at 5:37 PM, JT Olds jto...@xnet5.com wrote: Okay, well it's slightly more complicated. My C++ application needs to actually accept the technically invalid code points U+ and U+FFFE. Otherwise, I need my server application to know when invalid UTF-8 has happened. That's all fine. I have that all implemented. That's good. The problem is I want to exercise that behavior from my Python systest framework. The problem is the Python libs are trying to be too helpful. While I normally want them to do UTF-8 validation, I *don't* want them to during the systests, because I want to send bad UTF-8 to the server. Make sense? I'm trying to do bad things to make sure stuff still works in a systest environment. -JT On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote: If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.