Making searches inside a message file
I am interested in using protocol buffers for the following scenario: I have a large number of small data points I need to serialize to a file. Once this serialization process is complete, the data will be strictly read only, accessed by multiple threads, until I need to create a new version of the file. The application will access the file at startup, and while executing will need to retrieve data points based on a given parameter. Pretty much behaves like tabular data you would find in a database table. I need to retrieve a row, based on a primary key. Yet a traditional database is not a good solution in this situation. My understanding is that it's not a good idea to have large messages, but you can have a large number of small messages within a file, and it shouldn't be a problem. I have seen the .proto file structure and how messages are determined. What I don't understand (maybe I missed that part in the documentation) is how to make searches for messages within a file. If I use the repeated qualifier, it will let me have more than one message in another one, but I'll just retrieve it as a list. I can't specify what subset of messages I want. Can I sort messages based on a given field? Can I request a subset of messages by index range, or some other criteria? Can I browse through a message file, given a particular search parameter? Can I have some sort of Map inside the .proto definition, where I organize elements in key - value fashion? Alternatively, are my assumptions of what I should be able to do with protocol buffers wrong from the start? I assumed this kind of thing was possible since the drizzle devs are using protocol buffers for their database implementation. Link below. http://drizzle.org/wiki/Table_Proto_Definition Thanks in advance. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: RPC Design Structure
Thanks for the breakdown, that's very helpful. I had some trouble finding details about how the PB RPC terminology mapped to what I'm familiar with. It sounds like the system in question has a single public service with delegates calls to back-end services, distributed across machines available to it. It seems as though the public service provides an interface which is essentially a composition of the interfaces provided by the services available to it. The process of accepting requests for methods it provides on behalf of these back-end services, which it handles by delegating to the appropriate back-end service, is essentially multiplexing multiple services from a single network location. If this is at all accurate, then I believe that the API exported by the public service should be built, dynamically, based on the set of interfaces available to it. Taking this approach would allow each service to define its own interface. While all the functionality available on the network can be exposed as though it were all provided by a single entity, assuming there are no naming conflicts ;-) Or maybe I'm completely confused about the setup. Best regards, Shane On Wed, 2008-11-19 at 22:37 -0800, Kenton Varda wrote: The design of an RPC system is a large topic and there are lots of different ways you could do it. The RPC interfaces provided by Protocol Buffers are meant to provide the minimum support necessary to allow protoc to generate type-safe service stubs. How you want to implement them is up to you. This is described in the docs, but to summarize, the basic classes are: Service: An object that receives messages, possibly from remote clients. protoc generates an interface corresponding to each service in a .proto file. These interfaces are implemented by the server application. RpcChannel: Represents an abstract tunnel to a single service, allowing you to send messages just to that service. This is an abstract interface which should be implemented by the RPC library. RpcController: Manages state related to a single remote procedure call -- that is, a single message sent to the server, and its corresponding response. This is an abstract interface which should be implemented by the RPC library. Stub: A fake implementation of a service interface which just forwards messages to an RpcChannel. This makes the service appear to be a local object when it is not. protoc automatically generates a stub class for every service type. Note that you could easily have multiple RpcChannels that share a single TCP connection and lead to multiple service objects running on a single server. The interfaces are designed to put as few restrictions on implementations as possible. On Wed, Nov 19, 2008 at 9:52 PM, codeazure [EMAIL PROTECTED] wrote: OK, now you've confused me :-) I don't understand the exact relationship between all these classes, which is why I'm asking the question. If I want to build an application where I have a number of services that share a single TCP port, what organisation do I need to use? You mention multiplexing services - what does that mean for a client application using the connection? A UML:diagram (or similar) showing the relationship between controllers, channels services would really aid my understanding of how this system would operate. Perhaps these terms are in common usage in other RPC systems, but because I haven't used any, I'm uncertain about what these entities do. I've read the documentation several times, but the overview of how it works hasn't clicked. Thanks, Jeff On Nov 20, 12:54 pm, Kenton Varda [EMAIL PROTECTED] wrote: RpcController objects are per-request, not per-server or per-service. For every RPC request you make, you should have another RpcController object (though you can reuse an object by calling Clear() as long as you aren't making two requests at once). RpcChannel objects are per-service. Is that what you were thinking of? A single RpcChannel represents a connection to a single Service. However, there's nothing stopping you from multiplexing multiple RpcChannels across a single TCP connection, or creating a protocol that allows you to choose between multiple services exported by a server when constructing an RpcChannel. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group,
[Fwd: Re: Streaming]
Thanks very much Jon (see below). You make good points and I like the approach the you describe. I am still thinking, however, that there is power in the ability for message instances to write and parse themselves from a stream. A message instance could be passed a stream object which chains back to the network connection from which bytes are being received. A stop flag based parsing mechanism could be passed this buffer object, and would handle reading the stream and initializing its properties, exiting when the serialization of that message instance stopped. At this point, a new message instance could be created, and the process repeated. The type of message doing the parsing could vary from message to message, even with the serializations being sent and received back to back. This mechanism would work regardless of field-types being streamed. A message type consisting solely of varint fields, whose length is determined while reading the varint's value, would support streaming no differently than any other message type. The solution also seems to support every requirement supported by the original buffer type. Messages serialized to a buffer, could just as easily be initialized from that buffer as they could from the string contained by the buffer. m1 = Message() buffer = Buffer() [...] (initialize instance vars) m1.SerializeToBuffer(buffer) m2 = Message() m2.ParseFromBuffer(buffer) Produces same result as: m2 = Message() bytes = m1.SerializeToString() m2.ParseFromString(bytes) The string-based parse would ignore the stop bit, parsing the entire string. The buffer-based parsing would stop parsing when the stop bit, producing the same result. Handling of concatenated serializations is supported through repeated calls to parse from buffer: m1 = Message() [...] (initialize instance vars) m2 = Message() [...] (initialize instance vars) buffer = Buffer() m1.SerializeToBuffer(buffer) m2.SerializeToBuffer(buffer) m3 = Message() m3.ParseFromBuffer(buffer) m3.ParseFromBuffer(buffer) Would produce same result as: m3 = Message() m3.ParseFromString(m1.SerializeToString() + m2.SerializeToString()) As long as an unused, and never to be used, field number is used to generate the stop bit's key, then I don't believe there are any incompatibilities between buffer-based message marshalling and the existing string-based code. A very easy usage: # Sending side for message in messages: message.SerializeToBuffer(buffer) # Receiving side for msgtype in types: message = msgtype() message.ParseFromBuffer(buffer) Unless I've overlooked something, it seems like the stream based marshalling and unmarshalling is powerful, simple, and completely compatible with all existing code. But there is a very real chance I've overlooked something... - Shane Forwarded Message From: Jon Skeet [EMAIL PROTECTED] To: Shane Green [EMAIL PROTECTED] Subject: Re: Streaming Date: Fri, 5 Dec 2008 08:19:41 + 2008/12/5 Shane Green [EMAIL PROTECTED] Thanks Jon. Those are good points. I rather liked the self-delimiting nature of fields, and thought this method would bring that feature up to the message level, without breaking any of the existing capabilities. So my goal was a message which could truly be streamed; perhaps even sent without knowing its own size up front. Perhaps I overlooked something? Currently the PB format requires that you know the size of each submessage before you send it. You don't need to know the size of the whole message, as it's assumed to be the entire size of the datastream. It's unfortunate that you do need to provide the whole message to the output stream though, unless you want to manually serialize the individual fields. My goal was slightly different - I wanted to be able to stream a sequence of messages. The most obvious use case (in my view) is a log. Write out a massive log file as a sequence of entries, and you can read it back in one at a time. It's not designed to help to stream a single huge message though. Would you mind if I resent my questions to the group? I lack confidence and wanted to make sure I wasn't overlooking something ridiculous, but am thinking that the exchange would be informative. Absolutely. Feel free to quote anything I've written if you think it helps. Also, how are you serializing and parsing messages as if they are repeated fields of a container message? Is there a fair bit of parsing or work being done outside the standard protocol-buffer APIs? There's not a lot of work, to be honest. On the parsing side the main difficulty is getting a type-safe delegate to read a message from the stream. The writing side is trivial. Have a look at the code: http://github.com/jskeet/dotnet-protobufs/tree/master/src
[protobuf] OpenTelemetry JSON Protobuf deviations
Hi all! OpenTelemetry documents some deviations regarding mapping protobuf/json. See here: https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding Is there a way to implement those deviations with the java SDK? Thanks for your help! -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com.
Re: [protobuf] OpenTelemetry JSON Protobuf deviations
Hi Jerry, Yes, JsonFormat is working properly for us. However, we'd like to use it to implement the OpenTelemetry deviations <https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding>; is it something possible / planned or should we look somewhere else ? We noticed that one of the deviations related to enumerates can be solved with this feature <https://protobuf.dev/reference/java/api-docs/com/google/protobuf/util/JsonFormat.Printer.html#printingEnumsAsInts--> . Many Thanks! On Thursday, September 7, 2023 at 5:53:50 PM UTC+2 Jerry Berg wrote: > Hi John, > > Is the Java implementation of JSON Format for protobuf not working for you? > > https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java > > > On Thu, Sep 7, 2023 at 9:16 AM 'John Green' via Protocol Buffers < > prot...@googlegroups.com> wrote: > >> Hi all! >> >> OpenTelemetry documents some deviations regarding mapping protobuf/json. >> See here: >> https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding >> >> Is there a way to implement those deviations with the java SDK? >> >> Thanks for your help! >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Protocol Buffers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to protobuf+u...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com?utm_medium=email_source=footer> >> . >> > > > -- > Jerry Berg | Software Engineer | gb...@google.com | 720-808-1188 > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/ed6ac3e9-f2e6-4fdb-8d29-130064e2c785n%40googlegroups.com.