[protobuf] Re: Regd: Resolving Wire type ambiguities
It sounds like you actually have the message class available, not just a serialized instance of the message. In that case, you can derive the original type by reading the generated code. If you only have a compiled copy of the class, you can derive the type from its descriptor -- use MessageType.getDescriptor() in Java or MessageType::descriptor() in C++ to get it. In C++ you can even call descriptor->file()->DebugString() to generate a .proto-syntax representation of the file. On Mon, Nov 16, 2009 at 1:32 PM, rahul prasad wrote: > Hi, > > Thanks for the clarification. I did try one dirty method of finding the > original types, because of my ".proto"-less situation. I relied on exception > statements thrown out when I iterated through the protobuffers by trying to > extract a known wiretype with a wrong-type getter. I know it sucks, but it > worked for me. Thanks. > > Regards, > Rahul Prasad > > > > > On Mon, Nov 16, 2009 at 1:42 PM, Jason Hsueh wrote: > >> You can decode the protocol buffer with just wire type + tag number, but >> you won't know the original types without a proto definition. Everything >> would be treated as an unknown field. You could access these by iterating >> through the UnknownFieldSet, but again, you can't recover the original >> types. >> >> On Sat, Nov 14, 2009 at 1:10 PM, rahul prasad wrote: >> >>> Hi Marc, >>> >>> Thanks for the clarification. If the actual .proto was there, i would not >>> have posted that question [?] at the first place. Anyways, to decode a >>> protocol buffer, is it not enough to have just the wire type + tag number >>> combination? (except of course, handling of the sub-messages-ness and other >>> ambiguities you mentioned below have to be done manually though) >>> >>> Regards, >>> Rahul Prasad >>> >>> >>> >>> On Sat, Nov 14, 2009 at 3:57 PM, Marc Gravell wrote: >>> If you treat it as a string (UTF8), you are likely to get garbage. If you treat it as a byte[], then you just get a BLOB - you don't lose anything, but you might not be showing some more detail that you could show. You could, however, check for likely-sub-message-ness - i.e. after getting the length, you could try decoding the next few bytes as a variant, and do the shift trick; see if it looks likely to be a sub-message etc; you could try to validate the entire "string", see if it makes sense. Note that you don't have to store any of the data - just follow the rules for each wire-format until something doesn't look right or you've checked the string. Easiest, though, is to have the .proto available ;-p Marc 2009/11/14 rahul prasad > Hi, > > As seen from the below wire types table from protobuf documentation, if > i try to extract a value from a protobuf that is of type 2, it could > either > be a string, byte array or a embedded message etc, If I cast the value as > bytes or string on the decoding side, while on the encoding side it was > actually an embedded message, what would this result in? Will I be able to > retrieve the actual value, someway or the other doing it this way? > > The available wire types are as follows: > Type Meaning Used For 0 Varint int32, int64, uint32, uint64, sint32, > sint64, bool, enum 1 64-bit fixed64, sfixed64, double > 2Length-delimitedstring, bytes, embedded messages, packed repeated > fields3Start groupgroups (deprecated)4End groupgroups > (deprecated)532-bitfixed32, sfixed32, float > Regards, > Rahul Prasad > > > > -- Regards, Marc >>> >>> >>> >>> >> > > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~--- <>
[protobuf] Re: Regd: Resolving Wire type ambiguities
You can decode the protocol buffer with just wire type + tag number, but you won't know the original types without a proto definition. Everything would be treated as an unknown field. You could access these by iterating through the UnknownFieldSet, but again, you can't recover the original types. On Sat, Nov 14, 2009 at 1:10 PM, rahul prasad wrote: > Hi Marc, > > Thanks for the clarification. If the actual .proto was there, i would not > have posted that question [?] at the first place. Anyways, to decode a > protocol buffer, is it not enough to have just the wire type + tag number > combination? (except of course, handling of the sub-messages-ness and other > ambiguities you mentioned below have to be done manually though) > > Regards, > Rahul Prasad > > > > On Sat, Nov 14, 2009 at 3:57 PM, Marc Gravell wrote: > >> If you treat it as a string (UTF8), you are likely to get garbage. If you >> treat it as a byte[], then you just get a BLOB - you don't lose anything, >> but you might not be showing some more detail that you could show. >> >> You could, however, check for likely-sub-message-ness - i.e. after getting >> the length, you could try decoding the next few bytes as a variant, and do >> the shift trick; see if it looks likely to be a sub-message etc; you could >> try to validate the entire "string", see if it makes sense. Note that you >> don't have to store any of the data - just follow the rules for each >> wire-format until something doesn't look right or you've checked the string. >> >> Easiest, though, is to have the .proto available ;-p >> >> Marc >> >> 2009/11/14 rahul prasad >> >>> Hi, >>> >>> As seen from the below wire types table from protobuf documentation, if i >>> try to extract a value from a protobuf that is of type 2, it could either be >>> a string, byte array or a embedded message etc, If I cast the value as bytes >>> or string on the decoding side, while on the encoding side it was actually >>> an embedded message, what would this result in? Will I be able to retrieve >>> the actual value, someway or the other doing it this way? >>> >>> The available wire types are as follows: >>> Type Meaning Used For 0 Varint int32, int64, uint32, uint64, sint32, >>> sint64, bool, enum 1 64-bit fixed64, sfixed64, double 2 >>> Length-delimitedstring, bytes, embedded messages, packed repeated >>> fields3Start groupgroups (deprecated)4End groupgroups >>> (deprecated)532-bitfixed32, sfixed32, float >>> Regards, >>> Rahul Prasad >>> >>> >>> >>> >> >> >> -- >> Regards, >> >> Marc >> > > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~--- <>
[protobuf] Re: Regd: Resolving Wire type ambiguities
Hi Marc, Thanks for the clarification. If the actual .proto was there, i would not have posted that question [?] at the first place. Anyways, to decode a protocol buffer, is it not enough to have just the wire type + tag number combination? (except of course, handling of the sub-messages-ness and other ambiguities you mentioned below have to be done manually though) Regards, Rahul Prasad On Sat, Nov 14, 2009 at 3:57 PM, Marc Gravell wrote: > If you treat it as a string (UTF8), you are likely to get garbage. If you > treat it as a byte[], then you just get a BLOB - you don't lose anything, > but you might not be showing some more detail that you could show. > > You could, however, check for likely-sub-message-ness - i.e. after getting > the length, you could try decoding the next few bytes as a variant, and do > the shift trick; see if it looks likely to be a sub-message etc; you could > try to validate the entire "string", see if it makes sense. Note that you > don't have to store any of the data - just follow the rules for each > wire-format until something doesn't look right or you've checked the string. > > Easiest, though, is to have the .proto available ;-p > > Marc > > 2009/11/14 rahul prasad > >> Hi, >> >> As seen from the below wire types table from protobuf documentation, if i >> try to extract a value from a protobuf that is of type 2, it could either be >> a string, byte array or a embedded message etc, If I cast the value as bytes >> or string on the decoding side, while on the encoding side it was actually >> an embedded message, what would this result in? Will I be able to retrieve >> the actual value, someway or the other doing it this way? >> >> The available wire types are as follows: >> Type Meaning Used For 0 Varint int32, int64, uint32, uint64, sint32, >> sint64, bool, enum 1 64-bit fixed64, sfixed64, double 2 >> Length-delimitedstring, bytes, embedded messages, packed repeated >> fields3Start groupgroups (deprecated)4End groupgroups >> (deprecated)532-bitfixed32, sfixed32, float >> Regards, >> Rahul Prasad >> >> >> >> >> > > > -- > Regards, > > Marc > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~--- <>
[protobuf] Re: Regd: Resolving Wire type ambiguities
If you treat it as a string (UTF8), you are likely to get garbage. If you treat it as a byte[], then you just get a BLOB - you don't lose anything, but you might not be showing some more detail that you could show. You could, however, check for likely-sub-message-ness - i.e. after getting the length, you could try decoding the next few bytes as a variant, and do the shift trick; see if it looks likely to be a sub-message etc; you could try to validate the entire "string", see if it makes sense. Note that you don't have to store any of the data - just follow the rules for each wire-format until something doesn't look right or you've checked the string. Easiest, though, is to have the .proto available ;-p Marc 2009/11/14 rahul prasad > Hi, > > As seen from the below wire types table from protobuf documentation, if i > try to extract a value from a protobuf that is of type 2, it could either be > a string, byte array or a embedded message etc, If I cast the value as bytes > or string on the decoding side, while on the encoding side it was actually > an embedded message, what would this result in? Will I be able to retrieve > the actual value, someway or the other doing it this way? > > The available wire types are as follows: > Type Meaning Used For 0 Varint int32, int64, uint32, uint64, sint32, > sint64, bool, enum 1 64-bit fixed64, sfixed64, double 2 > Length-delimitedstring, bytes, embedded messages, packed repeated > fields3Start groupgroups (deprecated)4End groupgroups > (deprecated)532-bitfixed32, sfixed32, float > Regards, > Rahul Prasad > > > > > -- Regards, Marc --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---