Re: Deserializing Messages of unknown type at compile-time

2008-09-12 Thread Kenton Varda
Even if the message contains only one, non-repeated field, ParseFrom*() will
keep reading until EOF or an error.
At Google, we have lots of various container formats, for streaming,
record-based files, database tables, etc., where each record is a protocol
buffer.  All of these formats store the size of the message before the
message itself.  Our philosophy is that because we have protocol buffers,
all of these *other* formats and protocols can be designed to pass around
arbitrary byte blobs, which greatly simplifies them.  An arbitrary byte blob
is not necessarily self-delimiting, so it's up to these container formats to
keep track of the size separately.

On Thu, Sep 11, 2008 at 8:20 PM, <[EMAIL PROTECTED]> wrote:

>
> Kenton,
>
> > No, it won't work.  Protocol buffers are not self-delimiting.  They
> assume
> > that the input you provide is supposed to be one complete message, not a
> > message possibly followed by other stuff.
>
>
> There are a couple of related threads about delimiting the outer
> message (with either a marker or a length). The need for this seems to
> arise from streaming  (especially when input would block such as on a
> network socket).
>
> Could this not be solved by a simple convention in the proto file ?
> (Maybe I am missing something big here)
>
> Let us say we have a proto as follows
>
> message TRPProtocol
> {
>  message TRPPDU
>  {
>required int32 version;
>required int32 type;
>
>optional HelloRequest   hello_req = 1;
>optional HelloResponse  hello_resp = 2;
>optional ConnectRequest connect_req =
> 3;
> // etc etc
>  };
>  required TRPPDU  thepdu=1;
> };
>
> On the wire the outer message is not length delimited, but the inner
> message is. The inner message is represented by the 'required' field
> 'thepdu'.
>
> It would then be possible to stream instances of the inner message
> "TRPPDU". I hope my understanding is correct. Could you write
> something like the following ?
>
> TRPProtocol::TRPPDU Pdu;
> Pdu.ParseFromFileDescriptor( socket_fd);   // socket_fd has been
> opened and initialized earlier
>
> to read just one message, respond to that if needed, and then read the
> next one.
>
> Is my understanding correct ? Is this how it is done at Google when
> using PB for client - server comms ?
>
> Thanks,
>
> Vivek
>
>
> >
> > You will need to somehow communicate the size of the message and make
> sure
> > to limit the input to that size.
>
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-11 Thread vivek

Kenton,

> No, it won't work.  Protocol buffers are not self-delimiting.  They assume
> that the input you provide is supposed to be one complete message, not a
> message possibly followed by other stuff.


There are a couple of related threads about delimiting the outer
message (with either a marker or a length). The need for this seems to
arise from streaming  (especially when input would block such as on a
network socket).

Could this not be solved by a simple convention in the proto file ?
(Maybe I am missing something big here)

Let us say we have a proto as follows

message TRPProtocol
{
  message TRPPDU
  {
required int32 version;
required int32 type;

optional HelloRequest   hello_req = 1;
optional HelloResponse  hello_resp = 2;
optional ConnectRequest connect_req =
3;
 // etc etc
  };
  required TRPPDU  thepdu=1;
};

On the wire the outer message is not length delimited, but the inner
message is. The inner message is represented by the 'required' field
'thepdu'.

It would then be possible to stream instances of the inner message
"TRPPDU". I hope my understanding is correct. Could you write
something like the following ?

TRPProtocol::TRPPDU Pdu;
Pdu.ParseFromFileDescriptor( socket_fd);   // socket_fd has been
opened and initialized earlier

to read just one message, respond to that if needed, and then read the
next one.

Is my understanding correct ? Is this how it is done at Google when
using PB for client - server comms ?

Thanks,

Vivek


>
> You will need to somehow communicate the size of the message and make sure
> to limit the input to that size.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-11 Thread Alex Loddengaard
Hi Chris,

Once I learned that Messages are not self-delimiting (thanks, Kenton!), I
started working with Hadoop's source to stop the trailing bits from being
included in the InputStream.  I've since fixed this issue, kind of at least
;).

Perhaps a good general solution is to allow a user to put an option in a
.proto file or a Message declaration that makes Messages self-delimiting.
That way users who want speed don't need to us it, and users who want
convenience can use it.  The implementation of this would probably be
tricky, I'm sure.

Thanks for the follow up, Chris.  For now I'm good to go!  Let me know if I
can provide any other feedback.

Alex

On Thu, Sep 11, 2008 at 4:18 PM, Chris <[EMAIL PROTECTED]> wrote:

> Hi Alex,
>
> Kenton Varda wrote:
>
>  On Mon, Sep 8, 2008 at 9:11 PM, Alex Loddengaard <
>> [EMAIL PROTECTED] > wrote:
>>
>>I have a follow-up question:
>>
>>Will using
>>/messageInstance.newBuilderForType().mergeFrom(input).build();/
>>work for a stream that contains trailing binary information?
>>
>>
>> No, it won't work.  Protocol buffers are not self-delimiting.  They assume
>> that the input you provide is supposed to be one complete message, not a
>> message possibly followed by other stuff.
>>
>> You will need to somehow communicate the size of the message and make sure
>> to limit the input to that size.
>>
> Aha.  This message case is one of the heretofore
> hypothetical use cases I am discussing in the adjacent thread on this
> mailing list / group.  The thread is online at
>
>
> http://groups.google.com/group/protobuf/browse_thread/thread/b0ce2c7d8b05896e?hl=en
> and was spawned from
>
> http://groups.google.com/group/protobuf/browse_thread/thread/b0ce2c7d8b05896e?hl=en#
>
> This is mainly myself, Jon, and Kenton slowly forming a consensus on the
> right API for delimited messages.  I had proposed simply adding the length
> (varint) before the message, and Kenton demonstrated c++ code for this.  Jon
> proposed adding a field number / wiretype tag before the length and message,
> which makes it look much more like a protocol-buffer field on the wire.
>
> What do you need Alex?
>
> --
> Chris
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-11 Thread Chris

Hi Alex,

Kenton Varda wrote:
> On Mon, Sep 8, 2008 at 9:11 PM, Alex Loddengaard 
> <[EMAIL PROTECTED] > wrote:
>
> I have a follow-up question:
>
> Will using
> /messageInstance.newBuilderForType().mergeFrom(input).build();/
> work for a stream that contains trailing binary information?
>
>
> No, it won't work.  Protocol buffers are not self-delimiting.  They 
> assume that the input you provide is supposed to be one complete 
> message, not a message possibly followed by other stuff.
>
> You will need to somehow communicate the size of the message and make 
> sure to limit the input to that size.
Aha.  This message case is one of the heretofore 
hypothetical use cases I am discussing in the adjacent thread on this 
mailing list / group.  The thread is online at

http://groups.google.com/group/protobuf/browse_thread/thread/b0ce2c7d8b05896e?hl=en
and was spawned from
http://groups.google.com/group/protobuf/browse_thread/thread/b0ce2c7d8b05896e?hl=en#

This is mainly myself, Jon, and Kenton slowly forming a consensus on the 
right API for delimited messages.  I had proposed simply adding the 
length (varint) before the message, and Kenton demonstrated c++ code for 
this.  Jon proposed adding a field number / wiretype tag before the 
length and message, which makes it look much more like a protocol-buffer 
field on the wire.

What do you need Alex?

-- 
Chris

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-09 Thread Alex Loddengaard
Thanks for your feedback, Kenton!  You've answered all of my questions.

Alex

On Wed, Sep 10, 2008 at 1:00 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> On Mon, Sep 8, 2008 at 10:52 PM, Alex Loddengaard <
> [EMAIL PROTECTED]> wrote:
>
>> I should revise my problem slightly.  I had said that I am given an
>> instance of a Message class when deserializing.  This is true, though
>> sometimes that instance is null.  In the cases when it's null, I'm not able
>> to call newBuilderForType() on it.  I'm not able to call
>> getDefaultInstance(), either.  This is now problematic, though there may be
>> a work around.  Also given to me is a Class instance of the Message.  I'm
>> using Reflection to instantiate a new Message instance, then
>> getDefaultInstance() to get the default instance, and then I'm calling
>> newBuilderForType().  Is this problematic?
>>
>
> Hmm, I think the framework you are using is poorly designed -- it should
> always give you a non-null default instance.  Using Java reflection is ugly.
>
> getDefaultInstance() is actually a static method.  So, you don't have to
> instantiate a new message instance first -- just call the static method
> without an instance.  You can't actually instantiate the message class
> directly anyway, since the constructors are private.
>
>
>>
>>
>> Thanks again.  Sorry for all the spam!
>>
>> Alex
>>
>>
>> On Tue, Sep 9, 2008 at 12:11 PM, Alex Loddengaard <
>> [EMAIL PROTECTED]> wrote:
>>
>>> I have a follow-up question:
>>>
>>> Will using *
>>> messageInstance.newBuilderForType().mergeFrom(input).build();* work for
>>> a stream that contains trailing binary information?
>>>
>>> I'm asking this question for the following reason: I'm using a very
>>> simple example where my Message just contains a single String.  When I print
>>> the serialized message with a value of "my_string", I get
>>> "my_string".  Now, when I see the stream coming in on the
>>> deserialization side, I get "my_string"  The leading binary
>>> is the same as the original, however the trailing binary is something new
>>> entirely.  The trailing binary is probably being created by Hadoop.
>>>
>>> Kenton, you have made it very clear that *
>>> messageInstance.newBuilderForType().mergeFrom(input).build();* is the
>>> correct approach.  What could possibly be going wrong if the stream I'm
>>> trying to deserialize from contains trailing binary data?
>>>
>>> Thanks ahead of time for your help.
>>>
>>> Alex
>>>
>>>
>>> On Tue, Sep 9, 2008 at 10:47 AM, Alex Loddengaard <
>>> [EMAIL PROTECTED]> wrote:
>>>
 After taking my code out of Hadoop, it looks as though my deserializing
 mechanism is working fine.  My problem lies with my integration with 
 Hadoop.

 Thanks for resolving this issue, Kenton!

 Alex


 On Tue, Sep 9, 2008 at 9:42 AM, Alex Loddengaard <
 [EMAIL PROTECTED]> wrote:

> On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]>wrote:
>
>> To clarify:  In my original message I was saying that you should call
>> isInitialized() on the builder returned by mergeFrom(), to make sure the
>> parsed message is complete, before you call build().
>>
>
> Ah.  Now isInitialized() is returning true, though I'm still having
> problems deserializing.  Now that I'm using OutputStream and InputStream,
> I'm getting the following exception:
>
> com.google.protobuf.InvalidProtocolBufferException: Protocol message
> tag had invalid wire type.
>
> I'm going to take my code out of Hadoop to see if Hadoop is causing
> these issues.  I'm still weary of that, though, because other 
> serialization
> frameworks such as Facebook's Thrift seem to work in the framework that I 
> am
> using.
>
> Thanks for your help, Kenton!  I'll check back soon with my progress.
>


>>>
>>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-09 Thread Kenton Varda
On Mon, Sep 8, 2008 at 10:52 PM, Alex Loddengaard <[EMAIL PROTECTED]
> wrote:

> I should revise my problem slightly.  I had said that I am given an
> instance of a Message class when deserializing.  This is true, though
> sometimes that instance is null.  In the cases when it's null, I'm not able
> to call newBuilderForType() on it.  I'm not able to call
> getDefaultInstance(), either.  This is now problematic, though there may be
> a work around.  Also given to me is a Class instance of the Message.  I'm
> using Reflection to instantiate a new Message instance, then
> getDefaultInstance() to get the default instance, and then I'm calling
> newBuilderForType().  Is this problematic?
>

Hmm, I think the framework you are using is poorly designed -- it should
always give you a non-null default instance.  Using Java reflection is ugly.

getDefaultInstance() is actually a static method.  So, you don't have to
instantiate a new message instance first -- just call the static method
without an instance.  You can't actually instantiate the message class
directly anyway, since the constructors are private.


>
>
> Thanks again.  Sorry for all the spam!
>
> Alex
>
>
> On Tue, Sep 9, 2008 at 12:11 PM, Alex Loddengaard <
> [EMAIL PROTECTED]> wrote:
>
>> I have a follow-up question:
>>
>> Will using *messageInstance.newBuilderForType().mergeFrom(input).build();
>> * work for a stream that contains trailing binary information?
>>
>> I'm asking this question for the following reason: I'm using a very simple
>> example where my Message just contains a single String.  When I print the
>> serialized message with a value of "my_string", I get "my_string".
>> Now, when I see the stream coming in on the deserialization side, I get
>> "my_string"  The leading binary is the same as the original,
>> however the trailing binary is something new entirely.  The trailing binary
>> is probably being created by Hadoop.
>>
>> Kenton, you have made it very clear that *
>> messageInstance.newBuilderForType().mergeFrom(input).build();* is the
>> correct approach.  What could possibly be going wrong if the stream I'm
>> trying to deserialize from contains trailing binary data?
>>
>> Thanks ahead of time for your help.
>>
>> Alex
>>
>>
>> On Tue, Sep 9, 2008 at 10:47 AM, Alex Loddengaard <
>> [EMAIL PROTECTED]> wrote:
>>
>>> After taking my code out of Hadoop, it looks as though my deserializing
>>> mechanism is working fine.  My problem lies with my integration with Hadoop.
>>>
>>> Thanks for resolving this issue, Kenton!
>>>
>>> Alex
>>>
>>>
>>> On Tue, Sep 9, 2008 at 9:42 AM, Alex Loddengaard <
>>> [EMAIL PROTECTED]> wrote:
>>>
 On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> To clarify:  In my original message I was saying that you should call
> isInitialized() on the builder returned by mergeFrom(), to make sure the
> parsed message is complete, before you call build().
>

 Ah.  Now isInitialized() is returning true, though I'm still having
 problems deserializing.  Now that I'm using OutputStream and InputStream,
 I'm getting the following exception:

 com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
 had invalid wire type.

 I'm going to take my code out of Hadoop to see if Hadoop is causing
 these issues.  I'm still weary of that, though, because other serialization
 frameworks such as Facebook's Thrift seem to work in the framework that I 
 am
 using.

 Thanks for your help, Kenton!  I'll check back soon with my progress.

>>>
>>>
>>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-09 Thread Kenton Varda
On Mon, Sep 8, 2008 at 9:11 PM, Alex Loddengaard
<[EMAIL PROTECTED]>wrote:

> I have a follow-up question:
>
> Will using 
> *messageInstance.newBuilderForType().mergeFrom(input).build();*work for a 
> stream that contains trailing binary information?
>

No, it won't work.  Protocol buffers are not self-delimiting.  They assume
that the input you provide is supposed to be one complete message, not a
message possibly followed by other stuff.

You will need to somehow communicate the size of the message and make sure
to limit the input to that size.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Alex Loddengaard
On more follow-up (sorry for all these follow-ups):

I should revise my problem slightly.  I had said that I am given an instance
of a Message class when deserializing.  This is true, though sometimes that
instance is null.  In the cases when it's null, I'm not able to call
newBuilderForType() on it.  I'm not able to call getDefaultInstance(),
either.  This is now problematic, though there may be a work around.  Also
given to me is a Class instance of the Message.  I'm using Reflection to
instantiate a new Message instance, then getDefaultInstance() to get the
default instance, and then I'm calling newBuilderForType().  Is this
problematic?

Thanks again.  Sorry for all the spam!

Alex

On Tue, Sep 9, 2008 at 12:11 PM, Alex Loddengaard <[EMAIL PROTECTED]
> wrote:

> I have a follow-up question:
>
> Will using 
> *messageInstance.newBuilderForType().mergeFrom(input).build();*work for a 
> stream that contains trailing binary information?
>
> I'm asking this question for the following reason: I'm using a very simple
> example where my Message just contains a single String.  When I print the
> serialized message with a value of "my_string", I get "my_string".
> Now, when I see the stream coming in on the deserialization side, I get
> "my_string"  The leading binary is the same as the original,
> however the trailing binary is something new entirely.  The trailing binary
> is probably being created by Hadoop.
>
> Kenton, you have made it very clear that *
> messageInstance.newBuilderForType().mergeFrom(input).build();* is the
> correct approach.  What could possibly be going wrong if the stream I'm
> trying to deserialize from contains trailing binary data?
>
> Thanks ahead of time for your help.
>
> Alex
>
>
> On Tue, Sep 9, 2008 at 10:47 AM, Alex Loddengaard <
> [EMAIL PROTECTED]> wrote:
>
>> After taking my code out of Hadoop, it looks as though my deserializing
>> mechanism is working fine.  My problem lies with my integration with Hadoop.
>>
>> Thanks for resolving this issue, Kenton!
>>
>> Alex
>>
>>
>> On Tue, Sep 9, 2008 at 9:42 AM, Alex Loddengaard <
>> [EMAIL PROTECTED]> wrote:
>>
>>> On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>>>
 To clarify:  In my original message I was saying that you should call
 isInitialized() on the builder returned by mergeFrom(), to make sure the
 parsed message is complete, before you call build().

>>>
>>> Ah.  Now isInitialized() is returning true, though I'm still having
>>> problems deserializing.  Now that I'm using OutputStream and InputStream,
>>> I'm getting the following exception:
>>>
>>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>>> had invalid wire type.
>>>
>>> I'm going to take my code out of Hadoop to see if Hadoop is causing these
>>> issues.  I'm still weary of that, though, because other serialization
>>> frameworks such as Facebook's Thrift seem to work in the framework that I am
>>> using.
>>>
>>> Thanks for your help, Kenton!  I'll check back soon with my progress.
>>>
>>
>>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Alex Loddengaard
I have a follow-up question:

Will using *messageInstance.newBuilderForType().mergeFrom(input).build();*work
for a stream that contains trailing binary information?

I'm asking this question for the following reason: I'm using a very simple
example where my Message just contains a single String.  When I print the
serialized message with a value of "my_string", I get "my_string".
Now, when I see the stream coming in on the deserialization side, I get
"my_string"  The leading binary is the same as the original,
however the trailing binary is something new entirely.  The trailing binary
is probably being created by Hadoop.

Kenton, you have made it very clear that *
messageInstance.newBuilderForType().mergeFrom(input).build();* is the
correct approach.  What could possibly be going wrong if the stream I'm
trying to deserialize from contains trailing binary data?

Thanks ahead of time for your help.

Alex

On Tue, Sep 9, 2008 at 10:47 AM, Alex Loddengaard <[EMAIL PROTECTED]
> wrote:

> After taking my code out of Hadoop, it looks as though my deserializing
> mechanism is working fine.  My problem lies with my integration with Hadoop.
>
> Thanks for resolving this issue, Kenton!
>
> Alex
>
>
> On Tue, Sep 9, 2008 at 9:42 AM, Alex Loddengaard <
> [EMAIL PROTECTED]> wrote:
>
>> On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>>
>>> To clarify:  In my original message I was saying that you should call
>>> isInitialized() on the builder returned by mergeFrom(), to make sure the
>>> parsed message is complete, before you call build().
>>>
>>
>> Ah.  Now isInitialized() is returning true, though I'm still having
>> problems deserializing.  Now that I'm using OutputStream and InputStream,
>> I'm getting the following exception:
>>
>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>> had invalid wire type.
>>
>> I'm going to take my code out of Hadoop to see if Hadoop is causing these
>> issues.  I'm still weary of that, though, because other serialization
>> frameworks such as Facebook's Thrift seem to work in the framework that I am
>> using.
>>
>> Thanks for your help, Kenton!  I'll check back soon with my progress.
>>
>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Alex Loddengaard
After taking my code out of Hadoop, it looks as though my deserializing
mechanism is working fine.  My problem lies with my integration with Hadoop.

Thanks for resolving this issue, Kenton!

Alex

On Tue, Sep 9, 2008 at 9:42 AM, Alex Loddengaard
<[EMAIL PROTECTED]>wrote:

> On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
>> To clarify:  In my original message I was saying that you should call
>> isInitialized() on the builder returned by mergeFrom(), to make sure the
>> parsed message is complete, before you call build().
>>
>
> Ah.  Now isInitialized() is returning true, though I'm still having
> problems deserializing.  Now that I'm using OutputStream and InputStream,
> I'm getting the following exception:
>
> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
> had invalid wire type.
>
> I'm going to take my code out of Hadoop to see if Hadoop is causing these
> issues.  I'm still weary of that, though, because other serialization
> frameworks such as Facebook's Thrift seem to work in the framework that I am
> using.
>
> Thanks for your help, Kenton!  I'll check back soon with my progress.
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Alex Loddengaard
On Tue, Sep 9, 2008 at 9:28 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> To clarify:  In my original message I was saying that you should call
> isInitialized() on the builder returned by mergeFrom(), to make sure the
> parsed message is complete, before you call build().
>

Ah.  Now isInitialized() is returning true, though I'm still having problems
deserializing.  Now that I'm using OutputStream and InputStream, I'm getting
the following exception:

com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had
invalid wire type.

I'm going to take my code out of Hadoop to see if Hadoop is causing these
issues.  I'm still weary of that, though, because other serialization
frameworks such as Facebook's Thrift seem to work in the framework that I am
using.

Thanks for your help, Kenton!  I'll check back soon with my progress.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Kenton Varda
On Mon, Sep 8, 2008 at 6:27 PM, Kenton Varda <[EMAIL PROTECTED]> wrote:

>
>
> On Mon, Sep 8, 2008 at 6:18 PM, Alex Loddengaard <
> [EMAIL PROTECTED]> wrote:
>
>> On Tue, Sep 9, 2008 at 1:16 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>>
>>> That won't work.  DynamicMessage is a different class; it does not know
>>> how to instantiate the protocol-compiler-generated version of the class.
>>>  Instead, you should do:
>>>
>>>   Message result =
>>> messageInstance.newBuilderForType().mergeFrom(input).build();
>>>
>>> Actually, you should check isInitialized() before calling build(), or use
>>> buildPartial() instead, but that's a separate issue.
>>>
>>
>> I changed my deserializing code to use the above, but I'm getting the same
>> exception.  I also tried to call isInitialized() on the instance given to
>> me, and the instance is not initialized.
>>
>
> That message instance is probably a default instance.  isInitialized() will
> always be false on those, unless it has no required fields at all.
>

To clarify:  In my original message I was saying that you should call
isInitialized() on the builder returned by mergeFrom(), to make sure the
parsed message is complete, before you call build().


>
>>
>>
>>>  The protocol compiler would not allow you to use tag zero anyway.  It
>>> looks like your input data is not identical to the data written by the
>>> sender.
>>>
>>
>> I'm confident that the sender data is the same data that is created when I
>> serialize.
>>
>
> The exceptions that you're reporting strongly suggest that you are *not*
> seeing the same data on both ends.  Try this:  serialize to a byte array or
> ByteString, compute a checksum of some sort for debugging, then write the
> bytes to your output.  On the other end, read the bytes back into a
> ByteString or byte array, checksum again, and see if it's the same.  Then
> parse from that.  I'm pretty confident that if the checksums are the same,
> you will not see the error you're seeing.
>
>
>>   Perhaps I'm serializing incorrectly?  I'm creating a CodedOutputStream
>> given an OutputStream and passing that to writeTo.
>>
>
> This is redundant -- you can just pass the OutputStream to writeTo().
>
>
>>   However, I'm not using a CodedInputStream to deserialize.  Should I be
>> using Coded or non-Coded streams?
>>
>
> It doesn't matter.  If given a normal stream, mergeFrom() / writeTo() will
> wrap it in a coded stream on their own.
>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Kenton Varda
On Mon, Sep 8, 2008 at 6:18 PM, Alex Loddengaard
<[EMAIL PROTECTED]>wrote:

> On Tue, Sep 9, 2008 at 1:16 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
>> That won't work.  DynamicMessage is a different class; it does not know
>> how to instantiate the protocol-compiler-generated version of the class.
>>  Instead, you should do:
>>
>>   Message result =
>> messageInstance.newBuilderForType().mergeFrom(input).build();
>>
>> Actually, you should check isInitialized() before calling build(), or use
>> buildPartial() instead, but that's a separate issue.
>>
>
> I changed my deserializing code to use the above, but I'm getting the same
> exception.  I also tried to call isInitialized() on the instance given to
> me, and the instance is not initialized.
>

That message instance is probably a default instance.  isInitialized() will
always be false on those, unless it has no required fields at all.


>
>
>> The protocol compiler would not allow you to use tag zero anyway.  It
>> looks like your input data is not identical to the data written by the
>> sender.
>>
>
> I'm confident that the sender data is the same data that is created when I
> serialize.
>

The exceptions that you're reporting strongly suggest that you are *not*
seeing the same data on both ends.  Try this:  serialize to a byte array or
ByteString, compute a checksum of some sort for debugging, then write the
bytes to your output.  On the other end, read the bytes back into a
ByteString or byte array, checksum again, and see if it's the same.  Then
parse from that.  I'm pretty confident that if the checksums are the same,
you will not see the error you're seeing.


>   Perhaps I'm serializing incorrectly?  I'm creating a CodedOutputStream
> given an OutputStream and passing that to writeTo.
>

This is redundant -- you can just pass the OutputStream to writeTo().


>   However, I'm not using a CodedInputStream to deserialize.  Should I be
> using Coded or non-Coded streams?
>

It doesn't matter.  If given a normal stream, mergeFrom() / writeTo() will
wrap it in a coded stream on their own.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Alex Loddengaard
On Tue, Sep 9, 2008 at 1:16 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> That won't work.  DynamicMessage is a different class; it does not know how
> to instantiate the protocol-compiler-generated version of the class.
>  Instead, you should do:
>
>   Message result =
> messageInstance.newBuilderForType().mergeFrom(input).build();
>
> Actually, you should check isInitialized() before calling build(), or use
> buildPartial() instead, but that's a separate issue.
>

I changed my deserializing code to use the above, but I'm getting the same
exception.  I also tried to call isInitialized() on the instance given to
me, and the instance is not initialized.  That is, isInitialized() returned
false.  I'm plugging in to a large framework that I'm not entirely familiar
with (Hadoop), so I can only speculate what's going on here.  I think that
the Message instance given to me was created with reflection and is not a
valid Message.  I'm making this claim because isInitialized() is returning
false.

Is there any other way to deserialize?  Can you provide any other good
approaches to debugging this?  In the meantime, I'm going to take my example
out of the large framework in hopes of better understanding the problem I'm
having.


> The protocol compiler would not allow you to use tag zero anyway.  It looks
> like your input data is not identical to the data written by the sender.
>

I'm confident that the sender data is the same data that is created when I
serialize.  Perhaps I'm serializing incorrectly?  I'm creating a
CodedOutputStream given an OutputStream and passing that to writeTo.
However, I'm not using a CodedInputStream to deserialize.  Should I be using
Coded or non-Coded streams?

I stopped using CodedOutputStream when serializing and got the following
exception:

com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had
invalid wire type.
at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:62)
at
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:410)
at com.google.protobuf.FieldSet.mergeFieldFrom(FieldSet.java:454)
at com.google.protobuf.FieldSet.mergeFrom(FieldSet.java:402)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:248)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:240)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:329)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:184)

Thanks for your help, Kenton!  I got a good feeling that I'm almost there
:).

Alex

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Deserializing Messages of unknown type at compile-time

2008-09-08 Thread Kenton Varda
On Mon, Sep 8, 2008 at 4:33 AM, <[EMAIL PROTECTED]> wrote:

> More tricky.  Given a stream and an instance, I'm trying to get the
> Descriptor by calling Message#getDescriptorForType() on the instance
> and passing the return value, along with an input stream, to
> DynamicMessage#parseFrom(Descriptor,input).  I then cast the
> DynamicMessage that is returned by parseFrom to the same type of the
> instance that is given to me.


That won't work.  DynamicMessage is a different class; it does not know how
to instantiate the protocol-compiler-generated version of the class.
 Instead, you should do:

  Message result =
messageInstance.newBuilderForType().mergeFrom(input).build();

Actually, you should check isInitialized() before calling build(), or use
buildPartial() instead, but that's a separate issue.


> The problem that I'm encountering is during deserialization.  I'm
> getting an InvalidProtocolBufferException.  Here's the trace:
>
> ...
>
> What's curious about this is that my .proto files each only have one
> field in each Message, and each of those fields has a tag of 1.  None
> of my tags are 0.


The protocol compiler would not allow you to use tag zero anyway.  It looks
like your input data is not identical to the data written by the sender.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Deserializing Messages of unknown type at compile-time

2008-09-08 Thread alexloddengaard

I have a scenario where I'm trying to create a Serializer and
Deserializer class that can handle any general Message, given a stream
(InputStream or OutputStream) and an instance of a particular Message
implementation.

How can I use this information to serialize and deserialize?  I will
break things down slightly more:

Serializing:
This seems easy.  Given a stream and an instance, just call
Message#writeTo(output) on the instance.

Deserializing:
More tricky.  Given a stream and an instance, I'm trying to get the
Descriptor by calling Message#getDescriptorForType() on the instance
and passing the return value, along with an input stream, to
DynamicMessage#parseFrom(Descriptor,input).  I then cast the
DynamicMessage that is returned by parseFrom to the same type of the
instance that is given to me.

The problem that I'm encountering is during deserialization.  I'm
getting an InvalidProtocolBufferException.  Here's the trace:

com.google.protobuf.InvalidProtocolBufferException: Protocol message
contained an invalid tag (zero).
at
com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:
52)
at
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:67)
at com.google.protobuf.FieldSet.mergeFrom(FieldSet.java:397)
at com.google.protobuf.DynamicMessage
$Builder.mergeFrom(DynamicMessage.java:289)
at com.google.protobuf.DynamicMessage
$Builder.mergeFrom(DynamicMessage.java:213)
at com.google.protobuf.AbstractMessage
$Builder.mergeFrom(AbstractMessage.java:240)
at com.google.protobuf.AbstractMessage
$Builder.mergeFrom(AbstractMessage.java:329)
at
com.google.protobuf.DynamicMessage.parseFrom(DynamicMessage.java:102)

What's curious about this is that my .proto files each only have one
field in each Message, and each of those fields has a tag of 1.  None
of my tags are 0.

I have a feeling that I'm probably misusing the API for
deserialization, or perhaps I may have mis-defined my .proto files.
Here's an example of a .proto file that I'm using:

message LongMessage {
  required int64 value = 1;
}

Any and all help is greatly appreciated.  Thanks ahead of time for
your help :).

Alex

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---