Status: New
Owner: liuj...@google.com
Labels: Type-Defect Priority-Medium

New issue 494 by r...@rdna.ru: In Python API google.protobuf.text_format.Merge() fails if ``text'' param contains semicolon(s).
http://code.google.com/p/protobuf/issues/detail?id=494

What steps will reproduce the problem?

1. Create simple .proto-file and compile it:
19:47 0 user@host:~/pb_bug>>cat reproduce.proto
package reproduce;

message Person {
  optional int32 id = 1;
  optional string name = 2;
}

message People {
  repeated Person person = 1;
}
19:47 0 user@host:~/pb_bug>>protoc --python_out=. reproduce.proto
19:47 0 user@host:~/pb_bug>>

2. Create simple python script:
19:48 0 user@host:~/pb_bug>>cat reproduce.py
#!/usr/bin/env python

import sys
import reproduce_pb2
from google.protobuf.text_format import Merge

ppl_msg = reproduce_pb2.People()
with open(sys.argv[1]) as pt_fd:
    Merge(pt_fd.read(), ppl_msg)
print ppl_msg

3. Create 2 ASCII files with and without using semicolons:
19:48 0 user@host:~/pb_bug>>cat without_semicolon.pb.txt
person {
  id: 3
  name: "foo"
}
19:48 0 user@host:~/pb_bug>>cat with_semicolon.pb.txt
person {
  id: 3;
  name: "foo";
}

4. Try to use content of files from step #3 as ``text'' param to google.protobuf.text_format.Merge() function:
19:48 0 user@host:~/pb_bug>>./reproduce.py without_semicolon.pb.txt
person {
  id: 3
  name: "foo"
}
19:48 0 user@host:~/pb_bug>>./reproduce.py with_semicolon.pb.txt
Traceback (most recent call last):
  File "./reproduce.py", line 9, in <module>
    Merge(pt_fd.read(), ppl_msg)
File "/skynet/python/lib/python2.6/site-packages/protobuf-2.3.0-py2.6.egg/google/protobuf/text_format.py", line 138, in Merge
    _MergeField(tokenizer, message)
File "/skynet/python/lib/python2.6/site-packages/protobuf-2.3.0-py2.6.egg/google/protobuf/text_format.py", line 216, in _MergeField
    _MergeField(tokenizer, sub_message)
File "/skynet/python/lib/python2.6/site-packages/protobuf-2.3.0-py2.6.egg/google/protobuf/text_format.py", line 172, in _MergeField
    name = tokenizer.ConsumeIdentifier()
File "/skynet/python/lib/python2.6/site-packages/protobuf-2.3.0-py2.6.egg/google/protobuf/text_format.py", line 406, in ConsumeIdentifier
    raise self._ParseError('Expected identifier.')
google.protobuf.text_format.ParseError: 3:3 : Expected identifier.
19:49 1 user@host:~/pb_bug>>echo $?
1


What is the expected output? What do you see instead?

I expected that file with semicolons would be parsed successfully but parser fails. As I can see C++ API would not fail on the same input.

What version of the product are you using? On what operating system?

protobuf 2.4.1 / 2.5.0 on FreeBSD 10.0-CURRENT / Linux 3.2.0-25-server.

Please provide any additional information below.

The issue can be fixed by attached patch. The patch uses the same approach as in C++ API (not to fail on ``;'' and ``,'' symbols).

Attachments:
        patch-python_google_protobuf_text_format.py  518 bytes

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to