This was a great description. I appreciate you taking the time to write that out. I hadn't been able to find something as clear as this in the documentation. Thank you!
On Friday, September 9, 2022 at 5:07:10 PM UTC-4 [email protected] wrote: > Yeah, it's a bit confusing, but the numbers are not types. They are > field numbers and array indices -- that's all. So the table is > descriptor.proto. > > Start with the FileDescriptorProto and dereference by field number, if you > hit an array, the next number is an index: > message FileDescriptorProto { > optional string name = 1; // file name, relative to root of source tree > optional string package = 2; // e.g. "foo", "foo.bar", etc. > // Names of files imported by this file. > repeated string dependency = 3; > ... > // All top-level definitions in this file. > repeated DescriptorProto message_type = 4; > repeated EnumDescriptorProto enum_type = 5; > repeated ServiceDescriptorProto service = 6; > repeated FieldDescriptorProto extension = 7; > } > > A path starting with 1 would refer to the file name (shouldn't have any > further numbers). > A path starting with 2 would refer to the package (shouldn't have any > further numbers). > A path starting with 3 would refer to a dependency (import), the next > number in the path is an array index (which dependency) > A path starting with 4 refers to a message, the next number is the array > index that tells you which message. > A path starting with 5 refers to an enum, the next number is the array > index that tells you which enum. > A path starting with 6 refers to an service, the next number is the array > index that tells you which enum. > > If you have a path starting with [4,0,...] you are looking at > fileDescriptor.getMessageType(0); > If you have a path starting with [4,1,...] you are looking at > fileDescriptor.getMessageType(1); > If you have a path starting with [5,2,...] you are looking at > fileDescriptor.getEnumType(2); > > The next path element tells you the field number within the top-level > item's descriptor. For example, paths that point into a top-level message > definition: > message DescriptorProto { > optional string name = 1; > repeated FieldDescriptorProto field = 2; > repeated FieldDescriptorProto extension = 6; > repeated DescriptorProto nested_type = 3; > repeated EnumDescriptorProto enum_type = 4; > ... > } > > A path [4,0,2,1,...] corresponds to: fileDescriptor.getMessageType(0). > getField(1); > A path [4,0,3,1,2,3...] corresponds to: fileDescriptor.getMessageType(0). > getNestedType(1).getField(3); > > Just follow the proto field numbers. > > > On Fri, Sep 9, 2022 at 2:45 PM Kyle Papili <[email protected]> wrote: > >> My question is far dumber haha. Is there a table that describes what >> Field numbers correlate to what object types? >> >> I've seen 1,2,3,4,5,6,7 show up in paths as field numbers. My naive brain >> was under impression that they correlated to object types, no? >> >> 4: Message, 5: Enum, 6: Extension?? >> >> Is this not correct? Is there a table that can show me what each field >> number correlates to? >> >> On Friday, September 9, 2022 at 4:38:21 PM UTC-4 [email protected] wrote: >> >>> The ints in the path should be the field numbers and array indices along >>> the way from a top level field descriptor proto, like this: >>> Path: [4, 0, 2, 0] >>> Starting with the FileDescriptorProto: >>> 4 -> FileDescriptorProto { >>> ... >>> * repeated DescriptorProto message_type = 4;* >>> repeated EnumDescriptorProto enum_type = 5; >>> } >>> 0 -> index into FileDescriptorProto.messages[0] >>> 2 -> DescriptorProto { >>> optional name = 1; >>> * repeated FieldDescriptorProto field = 2;* >>> ... >>> } >>> 0 -> index into DescriptorProto.field[0] >>> >>> Thus this path/Location [4, 0, 2, 0] applies to the whole field >>> statement. >>> I believe the index of a message in the message_type array generally >>> corresponds to the order of all top-level message items in the file. >>> I also believe that the index of a field likewise corresponds to the >>> ordering of fields within the message. >>> >>> So if you have to deal with nested messages, the path will start with: >>> [4, (top-level-message-index), 3, (index-of-nested-message-type), ...] >>> >>> If I remember correctly, this breaks down for options because sometimes >>> the comments/location for an option is dropped, and when it is present the >>> path points to field 999 the uninterpreted options. >>> >>> But maybe you already had gotten that far and I misunderstood >>> your question. >>> >>> >>> On Fri, Sep 9, 2022 at 2:15 PM Kyle Papili <[email protected]> wrote: >>> >>>> Is there somewhere in the documentation that provides clear table >>>> describing which numbers in the path correlate to which types? I have >>>> found >>>> some inconsistencies with what I had thought. Any link to a table like >>>> this? >>>> >>>> On Friday, September 9, 2022 at 4:14:46 PM UTC-4 [email protected] >>>> wrote: >>>> >>>>> Ah, no, there is no magic. I only meant that if you wanted to have one >>>>> part of your code match up location data to descriptor object and attach >>>>> the location info directly, you could do it in a custom option. There's >>>>> no >>>>> getting around the actual awkward stepping through the paths to match >>>>> them >>>>> up. >>>>> >>>>> On Fri, Sep 9, 2022 at 2:08 PM Kyle Papili <[email protected]> wrote: >>>>> >>>>>> Yes, the "hacky method" proposed by shaod@ is basically what I am >>>>>> doing currently. It just seems to be unnecessarily complicated. >>>>>> >>>>>> What do you mean "store the location object in a custom option >>>>>> extension on the object in question". How would I store the location >>>>>> object >>>>>> as a custom extension of the object without knowing the object? If I >>>>>> knew >>>>>> the object that that location corresponded to then my problem would be >>>>>> resolved. The only way to match up location objects to Proto objects >>>>>> from >>>>>> what I've found is the hacky path traversal suggested by Shaod@. Am I >>>>>> missing something here? >>>>>> >>>>>> On Friday, September 9, 2022 at 4:02:05 PM UTC-4 [email protected] >>>>>> wrote: >>>>>> >>>>>>> Unfortunately, the only way to know the path to the Location object >>>>>>> is to know the path to the descriptor proto object in question. >>>>>>> Alternatively, you could iterate through all the sourcecodeinfo >>>>>>> elements and use their paths to navigate to the correct descriptor >>>>>>> object. >>>>>>> One technique I have used in the past is to iterate through all the >>>>>>> sourcecodeinfo elements and store the location object in a custom >>>>>>> option >>>>>>> extension on the object in question (or the parent object if it >>>>>>> something >>>>>>> that doesn't have options). >>>>>>> >>>>>>> Also, as shaod@ points out, some comments will not show up in >>>>>>> sourcecodeinfo. >>>>>>> >>>>>>> On Wednesday, September 7, 2022 at 4:11:32 PM UTC-6 [email protected] >>>>>>> wrote: >>>>>>> >>>>>>>> First keep in mind that some comments are detached and thus ignored >>>>>>>> by SourceCodeInfo. >>>>>>>> >>>>>>>> That being said, IIRC I've seen a very hacky way to achieve similar >>>>>>>> goals: >>>>>>>> https://github.com/protocolbuffers/protobuf/blob/3322c0b92a5001ade92608d75891d63c749d624d/src/google/protobuf/compiler/parser_unittest.cc#L2472 >>>>>>>> On Thursday, September 1, 2022 at 7:16:52 AM UTC-7 >>>>>>>> [email protected] wrote: >>>>>>>> >>>>>>>>> I'm parsing a large number of protobuf files and am using the >>>>>>>>> Source Code Info descriptor to extract comment data from the source >>>>>>>>> files >>>>>>>>> as well. I currently use the FileDescriptorProto.ListFields() method >>>>>>>>> to >>>>>>>>> extract the DescriptorProto objects I care about as well as the >>>>>>>>> SourceCodeInfo. >>>>>>>>> >>>>>>>>> To my knowledge, the only way to pair up Location fields with the >>>>>>>>> corresponding objects is via the path attribute >>>>>>>>> <https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.descriptor.pb>. >>>>>>>>> >>>>>>>>> This is fine; except for the fact that involves me manually stepping >>>>>>>>> through said path to land at my parsed Protobuf Object. This gets >>>>>>>>> complicated when dealing with layers of nested_types and I am >>>>>>>>> convinced >>>>>>>>> there must be a way for me to extract the path from the particular >>>>>>>>> DescriptorProto Object and then use that to match up the object with >>>>>>>>> the >>>>>>>>> path specified in the corresponding Location field. >>>>>>>>> >>>>>>>>> In short: How can I easily pair up DescriptorProto objects with >>>>>>>>> the Location objects that correspond to them? Specifically for >>>>>>>>> comment >>>>>>>>> parsing purposes. >>>>>>>>> >>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Protocol Buffers" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/protobuf/0c8d36db-53f9-4179-942f-201cd205b9dfn%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/protobuf/0c8d36db-53f9-4179-942f-201cd205b9dfn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jerry Berg | Software Engineer | [email protected] | 720-808-1188 >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Protocol Buffers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/protobuf/e97868df-f57d-45bf-b4f5-eeb0e2f4eed3n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/protobuf/e97868df-f57d-45bf-b4f5-eeb0e2f4eed3n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> Jerry Berg | Software Engineer | [email protected] | 720-808-1188 >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Protocol Buffers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/protobuf/94923187-2142-44ad-b0b8-97d1c12a5d5fn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/protobuf/94923187-2142-44ad-b0b8-97d1c12a5d5fn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > Jerry Berg | Software Engineer | [email protected] | 720-808-1188 > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/9b5d819b-76a6-4344-80a6-90fedf1ca756n%40googlegroups.com.
