Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Jason Hsueh
Sorry, I guess it's not clear if you know the message type or not. But the
issue is that you're getting some arbitrary message type out of the
transmitted FileDescriptorSet (in fact, any FileDescriptorSet that's ever
been received), not the type you want.

It's not clear from your code how the C++ code sends the same message type
and data, and how the listener manages to know what type it is. But the type
determination snippet that you provided is incorrect: the type name needs to
be specified by the sender rather than inferred.

On Sat, Aug 27, 2011 at 7:08 PM, Jason Hsueh  wrote:

>
>
> On Sat, Aug 27, 2011 at 6:00 PM, Nader Salehi wrote:
>
>> Thanks for looking into the code.  Please see my comments inline.
>>
>> On 8/27/2011 17:07 Jason Hsueh writes:
>> > No, the issue is one and the same. You cannot receive arbitrary
>> serialized
>> > bytes and determine what the message type is. You can do something
>> > heuristically, but there is no way for you to be absolutely certain of
>> the
>> > message type simply by looking at encoded bytes.
>>
>> Except that in this case, there is only one message.  Both senders are
>> sending the exact same message type.  The difference is that messages
>> coming from the Java sender won't get their field names displayed.
>>
>
> My point is that in the C++ client code, you are likely using the wrong
> type. Instead of the actual message type, you get the first message type in
> the transitive dependencies for which IsInitialized() succeeds after
> ParsePartialFromString(). Given that you know the message type, you should
> instantiate that type explicitly, rather than iterate over the list of all
> known message types.
>
> If you have the wrong message definition, then fields that don't match the
> expected (number, wire type) pairs are treated as unknown fields. When you
> print it out with DebugString(), these fields are printed using the tag
> number.
>
>>
>> >
>> > Specific points in the code:
>> >
>> > > C++
>> > > void
>> > > MessageDetector::insert (const pb::FileDescriptorSet &
>> fDescSet,
>> > > const string &
>>  typeName)
>> > > {
>> > >vector::iterator iter =
>> > >std::find_if(dbContainerVec.begin(),
>> > > dbContainerVec.end(),
>> > > boost::bind(&DBContainer::operator(), _1)
>> ==
>> > > typeName);
>> > >if (iter == dbContainerVec.end()) {
>> > >// Need to add the file descriptor set to the data base
>> > >cout << "Adding " << typeName << endl;
>> > >cout << fDescSet.DebugString() << endl;
>> > >shared_ptr
>> > >db(new pb::SimpleDescriptorDatabase);
>> > >for (int i = 0; i < fDescSet.file_size(); ++i) {
>> > >assert(db->Add(fDescSet.file(i)));
>> > >for (int j = 0; j <
>> fDescSet.file(i).message_type_size();
>> > ++j)
>> > > {
>> > >string formalName;
>> > >formalName += fDescSet.file(i).has_package() ?
>> > >fDescSet.file(i).package() + "." : "";
>> > >formalName +=
>> fDescSet.file(i).message_type(j).name
>> > ();
>> > >DBContainerPtr dbContainerElem(new DBContainer
>> > (formalName,
>> > >
>> db));
>> > >dbContainerVec.push_back(dbContainerElem);
>> >
>> > Here you add every message type found in the FileDescriptorSet into the
>> list
>> > of names.
>> >
>>
>> And it is the correct approach, isn't it?
>>
>> >
>> > >}
>> > >}
>> > >}
>> > > }
>> > >
>> > > void
>> > > MessageDetector::discoverAndDump (const string & msg)
>> > > {
>> > >bool found = false;
>> > >for (int i = dbContainerVec.size() - 1; !found && i >= 0;
>> --i) {
>> > >pb::DynamicMessageFactory factory;
>> > >const pb::Descriptor * desc =
>> > >dbContainerVec[i]->pool.FindMessageTypeByName
>> > >(dbContainerVec[i]->typeName);
>> > >std::auto_ptr
>> > >revEngMessage(factory.GetPrototype(desc)->New());
>> > >revEngMessage->ParsePartialFromString(msg);
>> >
>> > I glossed over the fact that you had actually used
>> ParsePartialFromString and
>> > call IsInitialized separately, but the effect is the same.
>> > ParsePartialFromString will succeed on any bytes that are in the
>> protocol
>> > buffer encoding format. Again, this is true regardless of the source and
>> > target message types.
>>
>> Agreed.  I just use parse partial only to avoid G's standard error
>> message when something can't be parsed.
>>
>> >
>> > >if (revEngMessage->IsInitialized()) {
>> >
>> > This call _may_

Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Jason Hsueh
On Sat, Aug 27, 2011 at 6:00 PM, Nader Salehi  wrote:

> Thanks for looking into the code.  Please see my comments inline.
>
> On 8/27/2011 17:07 Jason Hsueh writes:
> > No, the issue is one and the same. You cannot receive arbitrary
> serialized
> > bytes and determine what the message type is. You can do something
> > heuristically, but there is no way for you to be absolutely certain of
> the
> > message type simply by looking at encoded bytes.
>
> Except that in this case, there is only one message.  Both senders are
> sending the exact same message type.  The difference is that messages
> coming from the Java sender won't get their field names displayed.
>

My point is that in the C++ client code, you are likely using the wrong
type. Instead of the actual message type, you get the first message type in
the transitive dependencies for which IsInitialized() succeeds after
ParsePartialFromString(). Given that you know the message type, you should
instantiate that type explicitly, rather than iterate over the list of all
known message types.

If you have the wrong message definition, then fields that don't match the
expected (number, wire type) pairs are treated as unknown fields. When you
print it out with DebugString(), these fields are printed using the tag
number.

>
> >
> > Specific points in the code:
> >
> > > C++
> > > void
> > > MessageDetector::insert (const pb::FileDescriptorSet &
> fDescSet,
> > > const string &typeName)
> > > {
> > >vector::iterator iter =
> > >std::find_if(dbContainerVec.begin(),
> > > dbContainerVec.end(),
> > > boost::bind(&DBContainer::operator(), _1)
> ==
> > > typeName);
> > >if (iter == dbContainerVec.end()) {
> > >// Need to add the file descriptor set to the data base
> > >cout << "Adding " << typeName << endl;
> > >cout << fDescSet.DebugString() << endl;
> > >shared_ptr
> > >db(new pb::SimpleDescriptorDatabase);
> > >for (int i = 0; i < fDescSet.file_size(); ++i) {
> > >assert(db->Add(fDescSet.file(i)));
> > >for (int j = 0; j <
> fDescSet.file(i).message_type_size();
> > ++j)
> > > {
> > >string formalName;
> > >formalName += fDescSet.file(i).has_package() ?
> > >fDescSet.file(i).package() + "." : "";
> > >formalName +=
> fDescSet.file(i).message_type(j).name
> > ();
> > >DBContainerPtr dbContainerElem(new DBContainer
> > (formalName,
> > >
> db));
> > >dbContainerVec.push_back(dbContainerElem);
> >
> > Here you add every message type found in the FileDescriptorSet into the
> list
> > of names.
> >
>
> And it is the correct approach, isn't it?
>
> >
> > >}
> > >}
> > >}
> > > }
> > >
> > > void
> > > MessageDetector::discoverAndDump (const string & msg)
> > > {
> > >bool found = false;
> > >for (int i = dbContainerVec.size() - 1; !found && i >= 0;
> --i) {
> > >pb::DynamicMessageFactory factory;
> > >const pb::Descriptor * desc =
> > >dbContainerVec[i]->pool.FindMessageTypeByName
> > >(dbContainerVec[i]->typeName);
> > >std::auto_ptr
> > >revEngMessage(factory.GetPrototype(desc)->New());
> > >revEngMessage->ParsePartialFromString(msg);
> >
> > I glossed over the fact that you had actually used ParsePartialFromString
> and
> > call IsInitialized separately, but the effect is the same.
> > ParsePartialFromString will succeed on any bytes that are in the protocol
> > buffer encoding format. Again, this is true regardless of the source and
> > target message types.
>
> Agreed.  I just use parse partial only to avoid G's standard error
> message when something can't be parsed.
>
> >
> > >if (revEngMessage->IsInitialized()) {
> >
> > This call _may_ return true if the message type has no required fields,
> or the
> > serialized bytes happen to contain values that correspond to the defined
> > required fields. It does not imply that the serialized bytes represent a
> > message of type dbContainerVec[i]. As such, you are likely breaking out
> of the
> > loop with the wrong message type.
>
> Correct.
>
> >
> >
> > >cout << revEngMessage->DebugString() << endl;
> > >found = true;
> > >}
> > >}
> > >
> > >if (!found) {
> > >cout << (dbContainerVec.size() == 0 ?
> > > "No valid announcement has been received" :
> > > 

Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Nader Salehi
Thanks for looking into the code.  Please see my comments inline.

On 8/27/2011 17:07 Jason Hsueh writes:
> No, the issue is one and the same. You cannot receive arbitrary serialized
> bytes and determine what the message type is. You can do something
> heuristically, but there is no way for you to be absolutely certain of the
> message type simply by looking at encoded bytes.

Except that in this case, there is only one message.  Both senders are
sending the exact same message type.  The difference is that messages
coming from the Java sender won't get their field names displayed.

> 
> Specific points in the code:
> 
> >     C++
> >     void
> >     MessageDetector::insert (const pb::FileDescriptorSet & fDescSet,
> >                             const string &                typeName)
> >     {
> >        vector::iterator iter =
> >            std::find_if(dbContainerVec.begin(),
> >                         dbContainerVec.end(),
> >                         boost::bind(&DBContainer::operator(), _1) ==
> >     typeName);
> >        if (iter == dbContainerVec.end()) {
> >            // Need to add the file descriptor set to the data base
> >            cout << "Adding " << typeName << endl;
> >            cout << fDescSet.DebugString() << endl;
> >            shared_ptr
> >                db(new pb::SimpleDescriptorDatabase);
> >            for (int i = 0; i < fDescSet.file_size(); ++i) {
> >                assert(db->Add(fDescSet.file(i)));
> >                for (int j = 0; j < fDescSet.file(i).message_type_size();
> ++j)
> >     {
> >                    string formalName;
> >                    formalName += fDescSet.file(i).has_package() ?
> >                        fDescSet.file(i).package() + "." : "";
> >                    formalName += fDescSet.file(i).message_type(j).name
> ();
> >                    DBContainerPtr dbContainerElem(new DBContainer
> (formalName,
> >                                                                   db));
> >                    dbContainerVec.push_back(dbContainerElem);
> 
> Here you add every message type found in the FileDescriptorSet into the list
> of names.
>  

And it is the correct approach, isn't it?

> 
> >                }
> >            }
> >        }
> >     }
> >
> >     void
> >     MessageDetector::discoverAndDump (const string & msg)
> >     {
> >        bool found = false;
> >        for (int i = dbContainerVec.size() - 1; !found && i >= 0; --i) {
> >            pb::DynamicMessageFactory factory;
> >            const pb::Descriptor * desc =
> >                dbContainerVec[i]->pool.FindMessageTypeByName
> >                (dbContainerVec[i]->typeName);
> >            std::auto_ptr
> >                revEngMessage(factory.GetPrototype(desc)->New());
> >            revEngMessage->ParsePartialFromString(msg);
> 
> I glossed over the fact that you had actually used ParsePartialFromString and
> call IsInitialized separately, but the effect is the same.
> ParsePartialFromString will succeed on any bytes that are in the protocol
> buffer encoding format. Again, this is true regardless of the source and
> target message types.

Agreed.  I just use parse partial only to avoid G's standard error
message when something can't be parsed.

> 
> >            if (revEngMessage->IsInitialized()) {
> 
> This call _may_ return true if the message type has no required fields, or the
> serialized bytes happen to contain values that correspond to the defined
> required fields. It does not imply that the serialized bytes represent a
> message of type dbContainerVec[i]. As such, you are likely breaking out of the
> loop with the wrong message type.

Correct.

>  
> 
> >                cout << revEngMessage->DebugString() << endl;
> >                found = true;
> >            }
> >        }
> >
> >        if (!found) {
> >            cout << (dbContainerVec.size() == 0 ?
> >                     "No valid announcement has been received" :
> >                     "No message found for the message")
> >                 << endl;
> >        }
> >
> >     --
> >
> >     --
> >     You received this message because you are subscribed to the Google
> Groups
> >     "Protocol Buffers" group.
> >     To post to this group, send email to protobuf@googlegroups.com.
> >     To unsubscribe from this group, send email to
> >     protobuf+unsubscr...@googlegroups.com.
> >     For more options, visit this group at http://groups.google.com/group
> /
> >     protobuf?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to protobuf@googlegroups.com.
> > To unsubscribe fro

Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Jason Hsueh
No, the issue is one and the same. You cannot receive arbitrary serialized
bytes and determine what the message type is. You can do something
heuristically, but there is no way for you to be absolutely certain of the
message type simply by looking at encoded bytes.

Specific points in the code:

> C++
> > void
> > MessageDetector::insert (const pb::FileDescriptorSet & fDescSet,
> > const string &typeName)
> > {
> >vector::iterator iter =
> >std::find_if(dbContainerVec.begin(),
> > dbContainerVec.end(),
> > boost::bind(&DBContainer::operator(), _1) ==
> > typeName);
> >if (iter == dbContainerVec.end()) {
> >// Need to add the file descriptor set to the data base
> >cout << "Adding " << typeName << endl;
> >cout << fDescSet.DebugString() << endl;
> >shared_ptr
> >db(new pb::SimpleDescriptorDatabase);
> >for (int i = 0; i < fDescSet.file_size(); ++i) {
> >assert(db->Add(fDescSet.file(i)));
> >for (int j = 0; j < fDescSet.file(i).message_type_size();
> ++j)
> > {
> >string formalName;
> >formalName += fDescSet.file(i).has_package() ?
> >fDescSet.file(i).package() + "." : "";
> >formalName += fDescSet.file(i).message_type(j).name();
> >DBContainerPtr dbContainerElem(new
> DBContainer(formalName,
> >   db));
> >dbContainerVec.push_back(dbContainerElem);
>

Here you add every message type found in the FileDescriptorSet into the list
of names.


>  >}
> >}
> >}
> > }
> >
> > void
> > MessageDetector::discoverAndDump (const string & msg)
> > {
> >bool found = false;
> >for (int i = dbContainerVec.size() - 1; !found && i >= 0; --i) {
> >pb::DynamicMessageFactory factory;
> >const pb::Descriptor * desc =
> >dbContainerVec[i]->pool.FindMessageTypeByName
> >(dbContainerVec[i]->typeName);
> >std::auto_ptr
> >revEngMessage(factory.GetPrototype(desc)->New());
> >revEngMessage->ParsePartialFromString(msg);
>

I glossed over the fact that you had actually used ParsePartialFromString
and call IsInitialized separately, but the effect is the same.
ParsePartialFromString will succeed on any bytes that are in the protocol
buffer encoding format. Again, this is true regardless of the source and
target message types.

>if (revEngMessage->IsInitialized()) {
>

This call _may_ return true if the message type has no required fields, or
the serialized bytes happen to contain values that correspond to the defined
required fields. It does not imply that the serialized bytes represent a
message of type dbContainerVec[i]. As such, you are likely breaking out of
the loop with the wrong message type.


> >cout << revEngMessage->DebugString() << endl;
> >found = true;
> >}
> >}
> >
> >if (!found) {
> >cout << (dbContainerVec.size() == 0 ?
> > "No valid announcement has been received" :
> > "No message found for the message")
> > << endl;
> >}
> >
> > --
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to protobuf@googlegroups.com.
> > To unsubscribe from this group, send email to
> > protobuf+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/
> > protobuf?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to protobuf@googlegroups.com.
> > To unsubscribe from this group, send email to
> > protobuf+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/protobuf?
> > hl=en.
> >
>
> --
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Nader Salehi
If I understand correctly, you are talking about two separate issue.
One is that I use of ParseFromString() is unreliable.  The other one
is that I should switch to self describing messages.  The latter is
not feasible for now; I receive serialized PB messages that do not
contain their type name, and need to figure out what type they are.
In my particular setup, I only have one message type, and still cannot
figure out the messages that come from the Java sender.  Is there any
difference between the way things get serialized in Java vs. C++, or
is it a bug in my code?

Nader

On 8/27/2011 13:11 Jason Hsueh writes:
> It appears that the C++ listener code supports multiple types, but assumes
> that the correct type can be determined by checking if parsing the serialized
> string returns successfully. This is not a reliable way to detect message
> types: in fact ParseFromString() only fails if the serialized data is not a
> valid protocol buffer encoding (not type-specific), or if the message
> definition contains required fields and those fields were not seen in the
> serialized data. You should also transmit the message type name: http://
> code.google.com/apis/protocolbuffers/docs/techniques.html#self-description
> 
> On Fri, Aug 26, 2011 at 10:27 PM, Nader Salehi  wrote:
> 
> I am having difficulty understanding the problem I am having with
> parsing of some of the messages that I produce in my test environment.
> There seems to be some difference between how things are constructed
> and serialized in Java vs. C++.  That is, unless there is a bug in my
> code :)
>
> Here is how I have set things up:
>
>  * There is one Java sender, which periodically sends messages of type
>   A.  It also sends the FileDescriptorSet that is required for
>   parsing A.  Both events are serialized and sent over UDP packets
>
>  * There is also a C++ sender, which sends messages of the same type A
>   as the Java sender.  It does NOT, however, send the
>   FileDescriptorSet.  Like the Java version, it serializes the
>   message and sends it over UDP.
>
>  * There is a C++ listener, which listens to both senders.  It
>  successfully parse the FileDescriptorSet that the Java app sends,
>  creates SimpleDescriptorDatabase from it, and creates a
>  DescriptorPool using the SimpleDescriptorDatabase.
>
> The listener receives both message and the descriptor from the Java
> sender.  It is able to dynamically parse the message, but when it
> tries to display the message, using DebugString(), it fails to show
> field names.  Instead, it only shows field number.  But, it has
> absolutely no problem showing field names for messages it receives
> from the C++ sender.  Note that the file descriptor set comes from the
> Java sender.
>
> I am adding the Java module where it creates the file descriptor set.
> I also add the C++ code where it a) creates the descriptor pool and b)
> uses message factory to parse the message.  I would appreciate it if
> someone could give me a hint as to what might be in play.
>
> Thanks,
> Nader
>
> Java
> --
>        public static FileDescriptorSet getDescriptorSetFor(final Message
> aMessage) {
>                final FileDescriptor file = aMessage.getDescriptorForType
> ().getFile();
>                final List fileDescriptors = new ArrayList
> ();
>
>                Stack dependencyStack =
>                    new Stack();
>                FileDescriptorDependency fDescObj =
>                    new FileDescriptorDependency();
>                fDescObj.setFDesc(file);
>                fDescObj.setDependencyCnt(0);
>                dependencyStack.push(fDescObj);
>                while (!dependencyStack.empty()) {
>                    fDescObj = dependencyStack.pop();
>                    if (fDescObj.getDependencyCnt() ==
>                        fDescObj.getFDesc().getDependencies().size()) {
>
>                        // Time to add the file descriptor to the file
>                        // descriptor set
>                        fileDescriptors.add(fDescObj.getFDesc());
>                    } else {
>                        int idx = fDescObj.getDependencyCnt();
>                        fDescObj.setDependencyCnt(idx + 1);
>                        dependencyStack.push(fDescObj);
>                        List depList =
>                            fDescObj.getFDesc().getDependencies();
>                        FileDescriptorDependency fddObj =
>                            new FileDescriptorDependency();
>                        fddObj.setFDesc(depList.get(idx));
>                        fddObj.setDependencyCnt(0);
>                        dependen

Re: [protobuf] Dynamic Message Processing, Java vs. C++

2011-08-27 Thread Jason Hsueh
It appears that the C++ listener code supports multiple types, but assumes
that the correct type can be determined by checking if parsing the
serialized string returns successfully. This is not a reliable way to detect
message types: in fact ParseFromString() only fails if the serialized data
is not a valid protocol buffer encoding (not type-specific), or if the
message definition contains required fields and those fields were not seen
in the serialized data. You should also transmit the message type name:
http://code.google.com/apis/protocolbuffers/docs/techniques.html#self-description

On Fri, Aug 26, 2011 at 10:27 PM, Nader Salehi wrote:

> I am having difficulty understanding the problem I am having with
> parsing of some of the messages that I produce in my test environment.
> There seems to be some difference between how things are constructed
> and serialized in Java vs. C++.  That is, unless there is a bug in my
> code :)
>
> Here is how I have set things up:
>
>  * There is one Java sender, which periodically sends messages of type
>   A.  It also sends the FileDescriptorSet that is required for
>   parsing A.  Both events are serialized and sent over UDP packets
>
>  * There is also a C++ sender, which sends messages of the same type A
>   as the Java sender.  It does NOT, however, send the
>   FileDescriptorSet.  Like the Java version, it serializes the
>   message and sends it over UDP.
>
>  * There is a C++ listener, which listens to both senders.  It
>  successfully parse the FileDescriptorSet that the Java app sends,
>  creates SimpleDescriptorDatabase from it, and creates a
>  DescriptorPool using the SimpleDescriptorDatabase.
>
> The listener receives both message and the descriptor from the Java
> sender.  It is able to dynamically parse the message, but when it
> tries to display the message, using DebugString(), it fails to show
> field names.  Instead, it only shows field number.  But, it has
> absolutely no problem showing field names for messages it receives
> from the C++ sender.  Note that the file descriptor set comes from the
> Java sender.
>
> I am adding the Java module where it creates the file descriptor set.
> I also add the C++ code where it a) creates the descriptor pool and b)
> uses message factory to parse the message.  I would appreciate it if
> someone could give me a hint as to what might be in play.
>
> Thanks,
> Nader
>
> Java
> --
>public static FileDescriptorSet getDescriptorSetFor(final Message
> aMessage) {
>final FileDescriptor file =
> aMessage.getDescriptorForType().getFile();
>final List fileDescriptors = new
> ArrayList();
>
>Stack dependencyStack =
>new Stack();
>FileDescriptorDependency fDescObj =
>new FileDescriptorDependency();
>fDescObj.setFDesc(file);
>fDescObj.setDependencyCnt(0);
>dependencyStack.push(fDescObj);
>while (!dependencyStack.empty()) {
>fDescObj = dependencyStack.pop();
>if (fDescObj.getDependencyCnt() ==
>fDescObj.getFDesc().getDependencies().size()) {
>
>// Time to add the file descriptor to the file
>// descriptor set
>fileDescriptors.add(fDescObj.getFDesc());
>} else {
>int idx = fDescObj.getDependencyCnt();
>fDescObj.setDependencyCnt(idx + 1);
>dependencyStack.push(fDescObj);
>List depList =
>fDescObj.getFDesc().getDependencies();
>FileDescriptorDependency fddObj =
>new FileDescriptorDependency();
>fddObj.setFDesc(depList.get(idx));
>fddObj.setDependencyCnt(0);
>dependencyStack.push(fddObj);
>}
>}
>
>
>final FileDescriptorSet.Builder builder =
> FileDescriptorSet.newBuilder();
>for (FileDescriptor descriptor : fileDescriptors) {
>builder.addFile(descriptor.toProto());
>}
>return builder.build();
>}
>
>
> C++
>
> -
> 
>struct DBContainer {
>DBContainer(const std::string & name,
>boost::shared_ptr db) :
>typeName(name),
>descDB(db),
>pool(db.get())
>{
>}
>
>const std::string & operator()() const { return typeName; }
>
>std::string typeName;
>boost::shared_ptr descDB;
>pb::DescriptorPool