[protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python

2010-05-17 Thread JT Olds
Hello,

(I submitted this already via the protobuf google group web form, but
I think I screwed up. If not, sorry for the double post)

 I have a C++-based server using protocol buffers as the IDL, and I'm
trying to ensure that it rejects invalid UTF-8 strings. My systest
library is written in Python. The C++ protocol buffer library does not
seem to do any UTF-8 string checking on string types, whereas the
Python library does. So I added some UTF-8 validation testing to the
C++ server-side and I want to check that it works (in case a C++
client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the
Python systests to make sure the server rejects the string, the Python
library complains.

Is there a way to override this behavior?

I don't want to change my protocol buffer definitions to be the bytes
type, because these really should be strings, and the Python library
is doing exactly what I want for the general case.

-JT

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python

2010-05-17 Thread Jason Hsueh
If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED
defined, the C++ code will do UTF8 validation. However, it doesn't prevent
the data from serializing or parsing, it will simply log an error message.
How would you like it to fail?

On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote:

 Hello,

 (I submitted this already via the protobuf google group web form, but
 I think I screwed up. If not, sorry for the double post)

  I have a C++-based server using protocol buffers as the IDL, and I'm
 trying to ensure that it rejects invalid UTF-8 strings. My systest
 library is written in Python. The C++ protocol buffer library does not
 seem to do any UTF-8 string checking on string types, whereas the
 Python library does. So I added some UTF-8 validation testing to the
 C++ server-side and I want to check that it works (in case a C++
 client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the
 Python systests to make sure the server rejects the string, the Python
 library complains.

 Is there a way to override this behavior?

 I don't want to change my protocol buffer definitions to be the bytes
 type, because these really should be strings, and the Python library
 is doing exactly what I want for the general case.

 -JT

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python

2010-05-17 Thread JT Olds
Okay, well it's slightly more complicated. My C++ application needs to
actually accept the technically invalid code points U+ and U+FFFE.
Otherwise, I need my server application to know when invalid UTF-8 has
happened. That's all fine. I have that all implemented. That's good.

The problem is I want to exercise that behavior from my Python systest
framework. The problem is the Python libs are trying to be too
helpful. While I normally want them to do UTF-8 validation, I *don't*
want them to during the systests, because I want to send bad UTF-8 to
the server.

Make sense? I'm trying to do bad things to make sure stuff still works
in a systest environment.

-JT

On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote:
 If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED
 defined, the C++ code will do UTF8 validation. However, it doesn't prevent
 the data from serializing or parsing, it will simply log an error message.
 How would you like it to fail?

 On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote:

 Hello,

 (I submitted this already via the protobuf google group web form, but
 I think I screwed up. If not, sorry for the double post)

  I have a C++-based server using protocol buffers as the IDL, and I'm
 trying to ensure that it rejects invalid UTF-8 strings. My systest
 library is written in Python. The C++ protocol buffer library does not
 seem to do any UTF-8 string checking on string types, whereas the
 Python library does. So I added some UTF-8 validation testing to the
 C++ server-side and I want to check that it works (in case a C++
 client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the
 Python systests to make sure the server rejects the string, the Python
 library complains.

 Is there a way to override this behavior?

 I don't want to change my protocol buffer definitions to be the bytes
 type, because these really should be strings, and the Python library
 is doing exactly what I want for the general case.

 -JT

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python

2010-05-17 Thread JT Olds
It looks like I figured out a solution, though I'm not sure this is
the best way.

I have:

   pbuf = MyProtoBuf()
   pbuf.string_field =  # to make sure pbuf initialization stuff
works (sets _has_string_field, etc)
   pbuf._value_string_field = bad utf8
   f = pbuf.DESCRIPTOR.fields_by_number[pbuf.STRING_FIELD_NUMBER]
   f.type = f.TYPE_BYTES

On Mon, May 17, 2010 at 5:37 PM, JT Olds jto...@xnet5.com wrote:
 Okay, well it's slightly more complicated. My C++ application needs to
 actually accept the technically invalid code points U+ and U+FFFE.
 Otherwise, I need my server application to know when invalid UTF-8 has
 happened. That's all fine. I have that all implemented. That's good.

 The problem is I want to exercise that behavior from my Python systest
 framework. The problem is the Python libs are trying to be too
 helpful. While I normally want them to do UTF-8 validation, I *don't*
 want them to during the systests, because I want to send bad UTF-8 to
 the server.

 Make sense? I'm trying to do bad things to make sure stuff still works
 in a systest environment.

 -JT

 On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote:
 If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED
 defined, the C++ code will do UTF8 validation. However, it doesn't prevent
 the data from serializing or parsing, it will simply log an error message.
 How would you like it to fail?

 On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote:

 Hello,

 (I submitted this already via the protobuf google group web form, but
 I think I screwed up. If not, sorry for the double post)

  I have a C++-based server using protocol buffers as the IDL, and I'm
 trying to ensure that it rejects invalid UTF-8 strings. My systest
 library is written in Python. The C++ protocol buffer library does not
 seem to do any UTF-8 string checking on string types, whereas the
 Python library does. So I added some UTF-8 validation testing to the
 C++ server-side and I want to check that it works (in case a C++
 client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the
 Python systests to make sure the server rejects the string, the Python
 library complains.

 Is there a way to override this behavior?

 I don't want to change my protocol buffer definitions to be the bytes
 type, because these really should be strings, and the Python library
 is doing exactly what I want for the general case.

 -JT

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.