Making searches inside a message file

2009-06-20 Thread green

I am interested in using protocol buffers for the following scenario:

I have a large number of small data points I need to serialize to a
file. Once this serialization process is complete, the data will be
strictly read only, accessed by multiple threads, until I need to
create a new version of the file. The application will access the file
at startup, and while executing will need to retrieve data points
based on a given parameter. Pretty much behaves like tabular data you
would find in a database table. I need to retrieve a row, based on a
primary key. Yet a traditional database is not a good solution in
this situation.

My understanding is that it's not a good idea to have large messages,
but you can have a large number of small messages within a file, and
it shouldn't be a problem. I have seen the .proto file structure and
how messages are determined. What I don't understand (maybe I missed
that part in the documentation) is how to make searches for messages
within a file. If I use the repeated qualifier, it will let me have
more than one message in another one, but I'll just retrieve it as a
list. I can't specify what subset of messages I want.

Can I sort messages based on a given field?
Can I request a subset of messages by index range, or some other
criteria?
Can I browse through a message file, given a particular search
parameter?
Can I have some sort of Map inside the .proto definition, where I
organize elements in key - value fashion?

Alternatively, are my assumptions of what I should be able to do with
protocol buffers wrong from the start? I assumed this kind of thing
was possible since the drizzle devs are using protocol buffers for
their database implementation. Link below.

http://drizzle.org/wiki/Table_Proto_Definition

Thanks in advance.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: RPC Design Structure

2008-11-19 Thread Shane Green

Thanks for the breakdown, that's very helpful.  I had some trouble
finding details about how the PB RPC terminology mapped to what I'm
familiar with.

It sounds like the system in question has a single public service with
delegates calls to back-end services, distributed across machines
available to it.  It seems as though the public service provides an
interface which is essentially a composition of the interfaces provided
by the services available to it.  

The process of accepting requests for methods it provides on behalf of
these back-end services, which it handles by delegating to the
appropriate back-end service, is essentially multiplexing multiple
services from a single network location.

If this is at all accurate, then I believe that the API exported by the
public service should be built, dynamically, based on the set of
interfaces available to it.  Taking this approach would allow each
service to define its own interface.  While all the functionality
available on the network can be exposed as though it were all provided
by a single entity, assuming there are no naming conflicts ;-)

Or maybe I'm completely confused about the setup.



Best regards,
Shane

On Wed, 2008-11-19 at 22:37 -0800, Kenton Varda wrote:
 The design of an RPC system is a large topic and there are lots of
 different ways you could do it.  The RPC interfaces provided by
 Protocol Buffers are meant to provide the minimum support necessary to
 allow protoc to generate type-safe service stubs.  How you want to
 implement them is up to you.
 
 
 This is described in the docs, but to summarize, the basic classes
 are:
 
 
 Service: An object that receives messages, possibly from remote
 clients.  protoc generates an interface corresponding to each service
 in a .proto file.  These interfaces are implemented by the server
 application.
 
 
 RpcChannel: Represents an abstract tunnel to a single service,
 allowing you to send messages just to that service.  This is an
 abstract interface which should be implemented by the RPC library.
 
 
 
 RpcController: Manages state related to a single remote procedure call
 -- that is, a single message sent to the server, and its corresponding
 response.  This is an abstract interface which should be implemented
 by the RPC library.
 
 
 
 Stub: A fake implementation of a service interface which just forwards
 messages to an RpcChannel.  This makes the service appear to be a
 local object when it is not.  protoc automatically generates a stub
 class for every service type.
 
 
 
 
 Note that you could easily have multiple RpcChannels that share a
 single TCP connection and lead to multiple service objects running on
 a single server.  The interfaces are designed to put as few
 restrictions on implementations as possible.
 
 
 On Wed, Nov 19, 2008 at 9:52 PM, codeazure [EMAIL PROTECTED]
 wrote:
 
 OK, now you've confused me :-)
 
 I don't understand the exact relationship between all these
 classes,
 which is why I'm asking the question. If I want to build an
 application where I have a number of services that share a
 single TCP
 port, what organisation do I need to use?
 
 You mention multiplexing services - what does that mean for a
 client
 application using the connection?
 
 A UML:diagram (or similar) showing the relationship between
 controllers, channels  services would really aid my
 understanding of
 how this system would operate. Perhaps these terms are in
 common usage
 in other RPC systems, but because I haven't used any, I'm
 uncertain
 about what these entities do. I've read the documentation
 several
 times, but the overview of how it works hasn't clicked.
 
 Thanks,
 Jeff
 
 On Nov 20, 12:54 pm, Kenton Varda [EMAIL PROTECTED] wrote:
  RpcController objects are per-request, not per-server or
 per-service.  For
  every RPC request you make, you should have another
 RpcController object
  (though you can reuse an object by calling Clear() as long
 as you aren't
  making two requests at once).
  RpcChannel objects are per-service.  Is that what you were
 thinking of?  A
  single RpcChannel represents a connection to a single
 Service.  However,
  there's nothing stopping you from multiplexing multiple
 RpcChannels across a
  single TCP connection, or creating a protocol that allows
 you to choose
  between multiple services exported by a server when
 constructing an
  RpcChannel.
 
 
 
 
 
 
 
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, 

[Fwd: Re: Streaming]

2008-12-05 Thread Shane Green

Thanks very much Jon (see below).  You make good points and I like the
approach the you describe.  I am still thinking, however, that there is
power in the ability for message instances to write and parse themselves
from a stream.

A message instance could be passed a stream object which chains back to
the network connection from which bytes are being received.  A stop flag
based parsing mechanism could be passed this buffer object, and would
handle reading the stream and initializing its properties, exiting when
the serialization of that message instance stopped.  At this point, a
new message instance could be created, and the process repeated.  

The type of message doing the parsing could vary from message to
message, even with the serializations being sent and received back to
back.  This mechanism would work regardless of field-types being
streamed.  A message type consisting solely of varint fields, whose
length is determined while reading the varint's value, would support
streaming no differently than any other message type.

The solution also seems to support every requirement supported by the
original buffer type.  Messages serialized to a buffer, could just as
easily be initialized from that buffer as they could from the string
contained by the buffer.

m1 = Message()
buffer = Buffer()
[...] (initialize instance vars)
m1.SerializeToBuffer(buffer)

m2 = Message()
m2.ParseFromBuffer(buffer)

Produces same result as: 

m2 = Message()
bytes = m1.SerializeToString()
m2.ParseFromString(bytes)

The string-based parse would ignore the stop bit, parsing the entire
string.  The buffer-based parsing would stop parsing when the stop bit,
producing the same result.

Handling of concatenated serializations is supported through repeated
calls to parse from buffer:

m1 = Message()
[...] (initialize instance vars)
m2 = Message()
[...] (initialize instance vars)

buffer = Buffer()
m1.SerializeToBuffer(buffer)
m2.SerializeToBuffer(buffer)

m3 = Message()
m3.ParseFromBuffer(buffer)
m3.ParseFromBuffer(buffer)

Would produce same result as:

m3 = Message()
m3.ParseFromString(m1.SerializeToString() + m2.SerializeToString())

As long as an unused, and never to be used, field number is used to
generate the stop bit's key, then I don't believe there are any
incompatibilities between buffer-based message marshalling and the
existing string-based code.  A very easy usage:

# Sending side
for message in messages:
  message.SerializeToBuffer(buffer)

# Receiving side
for msgtype in types:
  message = msgtype()
  message.ParseFromBuffer(buffer)

Unless I've overlooked something, it seems like the stream based
marshalling and unmarshalling is powerful, simple, and completely
compatible with all existing code.  But there is a very real chance I've
overlooked something...




- Shane


 Forwarded Message 
 From: Jon Skeet [EMAIL PROTECTED]
 To: Shane Green [EMAIL PROTECTED]
 Subject: Re: Streaming
 Date: Fri, 5 Dec 2008 08:19:41 +
 
 2008/12/5 Shane Green [EMAIL PROTECTED]
 Thanks Jon.  Those are good points.  I rather liked the
 self-delimiting
 nature of fields, and thought this method would bring that
 feature up to
 the message level, without breaking any of the existing
 capabilities.
 So my goal was a message which could truly be streamed;
 perhaps even
 sent without knowing its own size up front.  Perhaps I
 overlooked
 something?
 
 Currently the PB format requires that you know the size of each
 submessage before you send it. You don't need to know the size of the
 whole message, as it's assumed to be the entire size of the
 datastream. It's unfortunate that you do need to provide the whole
 message to the output stream though, unless you want to manually
 serialize the individual fields.
 
 My goal was slightly different - I wanted to be able to stream a
 sequence of messages. The most obvious use case (in my view) is a log.
 Write out a massive log file as a sequence of entries, and you can
 read it back in one at a time. It's not designed to help to stream a
 single huge message though.
  
 Would you mind if I resent my questions to the group?  I lack
 confidence and wanted to make sure I wasn't overlooking
 something
 ridiculous, but am thinking that the exchange would be
 informative.
 
 Absolutely. Feel free to quote anything I've written if you think it
 helps.
 
 Also, how are you serializing and parsing messages as if they
 are
 repeated fields of a container message?  Is there a fair bit
 of parsing
 or work being done outside the standard protocol-buffer APIs?
 
 There's not a lot of work, to be honest. On the parsing side the main
 difficulty is getting a type-safe delegate to read a message from the
 stream. The writing side is trivial. Have a look at the code:
 
 http://github.com/jskeet/dotnet-protobufs/tree/master/src

[protobuf] OpenTelemetry JSON Protobuf deviations

2023-09-07 Thread 'John Green' via Protocol Buffers
Hi all!

OpenTelemetry documents some deviations regarding mapping protobuf/json. 
See here: https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding

Is there a way to implement those deviations with the java SDK?

Thanks for your help!


-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com.


Re: [protobuf] OpenTelemetry JSON Protobuf deviations

2023-09-08 Thread 'John Green' via Protocol Buffers
Hi Jerry,

Yes, JsonFormat is working properly for us. 

However, we'd like to use it to implement the OpenTelemetry deviations 
<https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding>; is it 
something possible / planned or should we look somewhere else ?

We noticed that one of the deviations related to enumerates can be solved 
with this feature 
<https://protobuf.dev/reference/java/api-docs/com/google/protobuf/util/JsonFormat.Printer.html#printingEnumsAsInts-->
.

Many Thanks!

On Thursday, September 7, 2023 at 5:53:50 PM UTC+2 Jerry Berg wrote:

> Hi John,
>
> Is the Java implementation of JSON Format for protobuf not working for you?
>
> https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java
>
>
> On Thu, Sep 7, 2023 at 9:16 AM 'John Green' via Protocol Buffers <
> prot...@googlegroups.com> wrote:
>
>> Hi all!
>>
>> OpenTelemetry documents some deviations regarding mapping protobuf/json. 
>> See here: 
>> https://opentelemetry.io/docs/specs/otlp/#json-protobuf-encoding
>>
>> Is there a way to implement those deviations with the java SDK?
>>
>> Thanks for your help!
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/protobuf/e1c3fb2c-d12f-443b-b722-780aa6050a8en%40googlegroups.com?utm_medium=email_source=footer>
>> .
>>
>
>
> -- 
> Jerry Berg | Software Engineer | gb...@google.com | 720-808-1188
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/ed6ac3e9-f2e6-4fdb-8d29-130064e2c785n%40googlegroups.com.