Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-11 Thread Jacob Barrett
On Thu, May 4, 2017 at 2:14 PM, Jacob Barrett  wrote:

> > > One benefit of messageHeader with chunk is that, it gives us ability to
> > > write different messages(multiplexing) on same socket. And if thread is
> > > ready it can write its message. Though we are not there yet, but that
> will
> > > require for single socket async architecture.
> > >
> >
> > I wouldn't tackle that at this layer. I would suggest adding a layer
> > between the message and TCP that creates channels that are opaque to the
> > message layer above. The message layer wouldn't know if it was talking to
> > multiple sockets to the client or single socket with multiple channels.
>
> Though we haven't really discussed it explicitly, the new protocol is
> adding the correlationID in the expectation that at some point it will be
> possible to execute requests out-of-order. One alternative to correlationID
> if we had multiple channels would be to use one channel per message or set
> of (ordered) messages. This would be a bit expensive if we used separate
> sockets for each, but if we had channels built into the protocol, it would
> be fine. The other use of correlationID might be for retries or to check up
> on a message after some issue leading to a disconnect. However, we have the
> EventID for that.
>
> As far as I know, we don't have the async, out-of-order functionality yet.
> I believe that in the current protocol, messages are ordered and
> synchronous -- they only happen in the order they're sent, and each
> operation blocks for the previous one to finish (though you could use
> multiple connections to get similar functionality).
>


My discussion here was around the concept of interleaving chunks of
individual messages not out of order responses of individual messages. From
the end user's perspective though whether we used correlation IDs or
ordered request/response over channels is irrelevant. At that level the API
should be asynchronous anyway. Underneath we can decide that if a single
channel/socket can have out of order messaging (not to be confused with out
of order partial message interleaving) or correlation ID.

Interleaving meaning
[Req1.1][Req2.1][Res2.1][Req1.2][Res1.1][Res2.2][Res1.2]
Correlation and part index required

Out of order meaning:
[Req1][Req2][Res2][Res2]
Correlation ID required

In order over channel:
1: [Req.1]  [Req.2][Res.1]   [Res.2]
2:[Req.1][Res.1]  [Res.2]
No correlation or ID required. Each request pulls a channel from a pool.
Naive approach is sockets, advanced is channel sublayer on the stack.

I really think In Order Over makes life easier. Maybe a hybrid approach
exists with Correlation IDs and Channels but really you can solve the same
problem with just channels, just more of them. At a Channel level it would
look synchronous request/response.

-Jake


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-11 Thread Galen M O'Sullivan
I think we should be presenting the current proposal as the API and message
structure, as would be laid out in an IDL. This way, we can experiment with
Protobuf, Thrift serialization,  for message structure without having to
exactly specify the message structure. This will make designing a protocol
easier and allow us to leverage existing serialization libraries. If we
ever get into a totally and manually specified binary protocol, charts of
binary representations will be useful, but for now I think using a library
that takes care of serialization for us will make our lives much easier.
I've made some changes (removing type data, removing RequestHeader and
ResponseHeader) that should make the wiki pages clearer.


On Thu, May 4, 2017 at 2:14 PM, Jacob Barrett  wrote:
> > One benefit of messageHeader with chunk is that, it gives us ability to
> > write different messages(multiplexing) on same socket. And if thread is
> > ready it can write its message. Though we are not there yet, but that
will
> > require for single socket async architecture.
> >
>
> I wouldn't tackle that at this layer. I would suggest adding a layer
> between the message and TCP that creates channels that are opaque to the
> message layer above. The message layer wouldn't know if it was talking to
> multiple sockets to the client or single socket with multiple channels.

Though we haven't really discussed it explicitly, the new protocol is
adding the correlationID in the expectation that at some point it will be
possible to execute requests out-of-order. One alternative to correlationID
if we had multiple channels would be to use one channel per message or set
of (ordered) messages. This would be a bit expensive if we used separate
sockets for each, but if we had channels built into the protocol, it would
be fine. The other use of correlationID might be for retries or to check up
on a message after some issue leading to a disconnect. However, we have the
EventID for that.

As far as I know, we don't have the async, out-of-order functionality yet.
I believe that in the current protocol, messages are ordered and
synchronous -- they only happen in the order they're sent, and each
operation blocks for the previous one to finish (though you could use
multiple connections to get similar functionality).

Galen

On Mon, May 8, 2017 at 2:55 PM, Ernest Burghardt 
wrote:

> +1 William, an even better example - that kind of representation will make
> it so much better/easier for geode users to implement against, regardless
> of language.
>
> On Mon, May 8, 2017 at 2:48 PM, William Markito Oliveira <
> william.mark...@gmail.com> wrote:
>
> > +1
> >
> > I think I've shared this before, but Kafka also has good (tabular)
> > representation for messages on their protocol.
> >
> > - https://kafka.apache.org/protocol#protocol_messages
> > - https://kafka.apache.org/protocol#protocol_message_sets
> >
> > On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt 
> > wrote:
> >
> > > Hello Geodes!
> > >
> > > Good discussion on what/how the messages will be/handled once a
> > connection
> > > is established.
> > >
> > > +1 to a simple initial handshake to establish version/supported
> features
> > > that client/server will be communicating.
> > >
> > > From what I've seen so far in the proposal it is missing a definition
> for
> > > the "connection"/disconnect messages.
> > > - expected to see it here:
> > > https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
> > >
> > > From a protocol perspective, this is currently a pain point for the
> > > geode-native library.
> > >
> > > As Jake mentioned previously, having messages that are class-like and
> > have
> > > a singular job helps client developers by having an explicit protocol
> to
> > > follow.
> > >
> > >
> > > The basic case a developer is going to exercise is to
> connect/disconnect.
> > > How to do this should be straightforward from the start.
> > >
> > > Geode probably does not need a 7 Layer OSI stack, but it might make
> sense
> > > to have a couple layers:
> > >
> > > 1 - transport  (network socket)
> > > 2 - protocol   (version/features)
> > > 3 - messaging (do cluster work)
> > >
> > > e.g.
> > > client library opens a socket to the server (layer 1 - check)
> > > client/server perform handshake and the connection is OPEN (layer 2 -
> > > check)
> > > pipe is open for business, client/server do work freely (layer 3 -
> check)
> > >
> > > When this is sorted out I think a couple simple sequence or activity
> > > diagrams would be very helpful to the visual-spatial folks in the
> > > community.
> > >
> > >
> > > Best,
> > > Ernie
> > >
> > > ps.  one consideration for message definition might be to use a more
> > > tabular presentation of messages followed by any
> > > definitions/cross-referencing... this is an example from a CDMA
> > protocol I
> > > have worked with in the past
> > >
> > > Location assignment message
> 

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-08 Thread Ernest Burghardt
+1 William, an even better example - that kind of representation will make
it so much better/easier for geode users to implement against, regardless
of language.

On Mon, May 8, 2017 at 2:48 PM, William Markito Oliveira <
william.mark...@gmail.com> wrote:

> +1
>
> I think I've shared this before, but Kafka also has good (tabular)
> representation for messages on their protocol.
>
> - https://kafka.apache.org/protocol#protocol_messages
> - https://kafka.apache.org/protocol#protocol_message_sets
>
> On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt 
> wrote:
>
> > Hello Geodes!
> >
> > Good discussion on what/how the messages will be/handled once a
> connection
> > is established.
> >
> > +1 to a simple initial handshake to establish version/supported features
> > that client/server will be communicating.
> >
> > From what I've seen so far in the proposal it is missing a definition for
> > the "connection"/disconnect messages.
> > - expected to see it here:
> > https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
> >
> > From a protocol perspective, this is currently a pain point for the
> > geode-native library.
> >
> > As Jake mentioned previously, having messages that are class-like and
> have
> > a singular job helps client developers by having an explicit protocol to
> > follow.
> >
> >
> > The basic case a developer is going to exercise is to connect/disconnect.
> > How to do this should be straightforward from the start.
> >
> > Geode probably does not need a 7 Layer OSI stack, but it might make sense
> > to have a couple layers:
> >
> > 1 - transport  (network socket)
> > 2 - protocol   (version/features)
> > 3 - messaging (do cluster work)
> >
> > e.g.
> > client library opens a socket to the server (layer 1 - check)
> > client/server perform handshake and the connection is OPEN (layer 2 -
> > check)
> > pipe is open for business, client/server do work freely (layer 3 - check)
> >
> > When this is sorted out I think a couple simple sequence or activity
> > diagrams would be very helpful to the visual-spatial folks in the
> > community.
> >
> >
> > Best,
> > Ernie
> >
> > ps.  one consideration for message definition might be to use a more
> > tabular presentation of messages followed by any
> > definitions/cross-referencing... this is an example from a CDMA
> protocol I
> > have worked with in the past
> >
> > Location assignment message
> >
> > Field
> >
> > Length (bits)
> >
> > MessageID
> >
> > 8
> >
> > TransactionID
> >
> > 8
> >
> > LocationType
> >
> > 8
> >
> > LocationLength
> >
> > 8
> >
> > LocationValue
> >
> > 8 × LocationLength
> >
> >1.
> >
> >MessageID   The access network shall set this field to 0x05.
> >2.
> >
> >TransactionID  The access network shall increment this value for
> >each new LocationAssignment message sent.
> >
> >   LocationType   The access network shall set this field to
> the
> > type of the location as specified in Table
> >
> >
> >
> >
> >
> > [image: page144image36968] [image: page144image37392] [image:
> > page144image37976] [image: page144image38560] [image:
> > page144image38984] [image:
> > page144image39408]
> >
> >
> >
> >
> >
> >
> >
> > On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett 
> wrote:
> >
> > > On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra 
> > > wrote:
> > >
> > > >
> > > > 0. In first phase we are not doing chunking/fragmentation. And even
> > this
> > > > will be option for client.(
> > > > https://cwiki.apache.org/confluence/display/GEODE/
> > Message+Structure+and+
> > > Definition#MessageStructureandDefinition-Protocolnegotiation
> > > > )
> > > >
> > >
> > > I highly suggest initial handshake be more relaxed than specific
> "version
> > > number" or flags. Consider sending objects that indicate support for
> > > features or even a list of feature IDs. At connect server can send list
> > of
> > > feature IDs to the client. The client can respond with a set of feature
> > IDs
> > > it supports as well as any metadata associated with them, say default
> set
> > > of supported encodings.
> > >
> > >
> > > > 1. Are you refereeing websocket/spdy? But I think we are talking
> almost
> > > > same thing, may be push isPartialMessage flag with chunk
> > length(Anthony's
> > > > example below) ?
> > > >
> > >
> > > I am not sure what you mean here but if you are talking about layering
> a
> > > channel protocol handler then I guess yes. The point is that each of
> > these
> > > behaviors should be encapsulated in specific layers and not intermixed
> > with
> > > the message.
> > >
> > >
> > > > 2. That's the part of the problem. Even if you need to serialize the
> > > > "String", you need to write length first and then need to write
> > > serialized
> > > > utf bytes. We can implement chunked input stream and can de-serialize
> > the
> > > > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> > > >
> > >
> > 

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-08 Thread William Markito Oliveira
+1

I think I've shared this before, but Kafka also has good (tabular)
representation for messages on their protocol.

- https://kafka.apache.org/protocol#protocol_messages
- https://kafka.apache.org/protocol#protocol_message_sets

On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt 
wrote:

> Hello Geodes!
>
> Good discussion on what/how the messages will be/handled once a connection
> is established.
>
> +1 to a simple initial handshake to establish version/supported features
> that client/server will be communicating.
>
> From what I've seen so far in the proposal it is missing a definition for
> the "connection"/disconnect messages.
> - expected to see it here:
> https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
>
> From a protocol perspective, this is currently a pain point for the
> geode-native library.
>
> As Jake mentioned previously, having messages that are class-like and have
> a singular job helps client developers by having an explicit protocol to
> follow.
>
>
> The basic case a developer is going to exercise is to connect/disconnect.
> How to do this should be straightforward from the start.
>
> Geode probably does not need a 7 Layer OSI stack, but it might make sense
> to have a couple layers:
>
> 1 - transport  (network socket)
> 2 - protocol   (version/features)
> 3 - messaging (do cluster work)
>
> e.g.
> client library opens a socket to the server (layer 1 - check)
> client/server perform handshake and the connection is OPEN (layer 2 -
> check)
> pipe is open for business, client/server do work freely (layer 3 - check)
>
> When this is sorted out I think a couple simple sequence or activity
> diagrams would be very helpful to the visual-spatial folks in the
> community.
>
>
> Best,
> Ernie
>
> ps.  one consideration for message definition might be to use a more
> tabular presentation of messages followed by any
> definitions/cross-referencing... this is an example from a CDMA protocol I
> have worked with in the past
>
> Location assignment message
>
> Field
>
> Length (bits)
>
> MessageID
>
> 8
>
> TransactionID
>
> 8
>
> LocationType
>
> 8
>
> LocationLength
>
> 8
>
> LocationValue
>
> 8 × LocationLength
>
>1.
>
>MessageID   The access network shall set this field to 0x05.
>2.
>
>TransactionID  The access network shall increment this value for
>each new LocationAssignment message sent.
>
>   LocationType   The access network shall set this field to the
> type of the location as specified in Table
>
>
>
>
>
> [image: page144image36968] [image: page144image37392] [image:
> page144image37976] [image: page144image38560] [image:
> page144image38984] [image:
> page144image39408]
>
>
>
>
>
>
>
> On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett  wrote:
>
> > On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra 
> > wrote:
> >
> > >
> > > 0. In first phase we are not doing chunking/fragmentation. And even
> this
> > > will be option for client.(
> > > https://cwiki.apache.org/confluence/display/GEODE/
> Message+Structure+and+
> > Definition#MessageStructureandDefinition-Protocolnegotiation
> > > )
> > >
> >
> > I highly suggest initial handshake be more relaxed than specific "version
> > number" or flags. Consider sending objects that indicate support for
> > features or even a list of feature IDs. At connect server can send list
> of
> > feature IDs to the client. The client can respond with a set of feature
> IDs
> > it supports as well as any metadata associated with them, say default set
> > of supported encodings.
> >
> >
> > > 1. Are you refereeing websocket/spdy? But I think we are talking almost
> > > same thing, may be push isPartialMessage flag with chunk
> length(Anthony's
> > > example below) ?
> > >
> >
> > I am not sure what you mean here but if you are talking about layering a
> > channel protocol handler then I guess yes. The point is that each of
> these
> > behaviors should be encapsulated in specific layers and not intermixed
> with
> > the message.
> >
> >
> > > 2. That's the part of the problem. Even if you need to serialize the
> > > "String", you need to write length first and then need to write
> > serialized
> > > utf bytes. We can implement chunked input stream and can de-serialize
> the
> > > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> > >
> >
> > Right, and in this case the length is never the length of the string, it
> is
> > the length of the byte encoding of the string. This is not known until
> the
> > encoding is complete. So by chunking we can write the length of smaller
> > buffers (from buffer pools) as the length of that sequence of bytes, the
> > last chunk terminated with length 0. Each of those chunks can be based
> to a
> > UTF-8 to UTF-16 transcoder to create the String.
> >
> > -Jake
> >
>



-- 
~/William


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-08 Thread Ernest Burghardt
Hello Geodes!

Good discussion on what/how the messages will be/handled once a connection
is established.

+1 to a simple initial handshake to establish version/supported features
that client/server will be communicating.

>From what I've seen so far in the proposal it is missing a definition for
the "connection"/disconnect messages.
- expected to see it here:
https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API

>From a protocol perspective, this is currently a pain point for the
geode-native library.

As Jake mentioned previously, having messages that are class-like and have
a singular job helps client developers by having an explicit protocol to
follow.


The basic case a developer is going to exercise is to connect/disconnect.
How to do this should be straightforward from the start.

Geode probably does not need a 7 Layer OSI stack, but it might make sense
to have a couple layers:

1 - transport  (network socket)
2 - protocol   (version/features)
3 - messaging (do cluster work)

e.g.
client library opens a socket to the server (layer 1 - check)
client/server perform handshake and the connection is OPEN (layer 2 - check)
pipe is open for business, client/server do work freely (layer 3 - check)

When this is sorted out I think a couple simple sequence or activity
diagrams would be very helpful to the visual-spatial folks in the community.


Best,
Ernie

ps.  one consideration for message definition might be to use a more
tabular presentation of messages followed by any
definitions/cross-referencing... this is an example from a CDMA protocol I
have worked with in the past

Location assignment message

Field

Length (bits)

MessageID

8

TransactionID

8

LocationType

8

LocationLength

8

LocationValue

8 × LocationLength

   1.

   MessageID   The access network shall set this field to 0x05.
   2.

   TransactionID  The access network shall increment this value for
   each new LocationAssignment message sent.

  LocationType   The access network shall set this field to the
type of the location as specified in Table





[image: page144image36968] [image: page144image37392] [image:
page144image37976] [image: page144image38560] [image:
page144image38984] [image:
page144image39408]







On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett  wrote:

> On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra 
> wrote:
>
> >
> > 0. In first phase we are not doing chunking/fragmentation. And even this
> > will be option for client.(
> > https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+
> Definition#MessageStructureandDefinition-Protocolnegotiation
> > )
> >
>
> I highly suggest initial handshake be more relaxed than specific "version
> number" or flags. Consider sending objects that indicate support for
> features or even a list of feature IDs. At connect server can send list of
> feature IDs to the client. The client can respond with a set of feature IDs
> it supports as well as any metadata associated with them, say default set
> of supported encodings.
>
>
> > 1. Are you refereeing websocket/spdy? But I think we are talking almost
> > same thing, may be push isPartialMessage flag with chunk length(Anthony's
> > example below) ?
> >
>
> I am not sure what you mean here but if you are talking about layering a
> channel protocol handler then I guess yes. The point is that each of these
> behaviors should be encapsulated in specific layers and not intermixed with
> the message.
>
>
> > 2. That's the part of the problem. Even if you need to serialize the
> > "String", you need to write length first and then need to write
> serialized
> > utf bytes. We can implement chunked input stream and can de-serialize the
> > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> >
>
> Right, and in this case the length is never the length of the string, it is
> the length of the byte encoding of the string. This is not known until the
> encoding is complete. So by chunking we can write the length of smaller
> buffers (from buffer pools) as the length of that sequence of bytes, the
> last chunk terminated with length 0. Each of those chunks can be based to a
> UTF-8 to UTF-16 transcoder to create the String.
>
> -Jake
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-08 Thread Jacob Barrett
On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra  wrote:

>
> 0. In first phase we are not doing chunking/fragmentation. And even this
> will be option for client.(
> https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-Protocolnegotiation
> )
>

I highly suggest initial handshake be more relaxed than specific "version
number" or flags. Consider sending objects that indicate support for
features or even a list of feature IDs. At connect server can send list of
feature IDs to the client. The client can respond with a set of feature IDs
it supports as well as any metadata associated with them, say default set
of supported encodings.


> 1. Are you refereeing websocket/spdy? But I think we are talking almost
> same thing, may be push isPartialMessage flag with chunk length(Anthony's
> example below) ?
>

I am not sure what you mean here but if you are talking about layering a
channel protocol handler then I guess yes. The point is that each of these
behaviors should be encapsulated in specific layers and not intermixed with
the message.


> 2. That's the part of the problem. Even if you need to serialize the
> "String", you need to write length first and then need to write serialized
> utf bytes. We can implement chunked input stream and can de-serialize the
> object as it is coming (DataSerializable.fromData(ChunkedStream)).
>

Right, and in this case the length is never the length of the string, it is
the length of the byte encoding of the string. This is not known until the
encoding is complete. So by chunking we can write the length of smaller
buffers (from buffer pools) as the length of that sequence of bytes, the
last chunk terminated with length 0. Each of those chunks can be based to a
UTF-8 to UTF-16 transcoder to create the String.

-Jake


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Hitesh Khamesra

0. In first phase we are not doing chunking/fragmentation. And even this will 
be option for 
client.(https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-Protocolnegotiation)
1. Are you refereeing websocket/spdy? But I think we are talking almost same 
thing, may be push isPartialMessage flag with chunk length(Anthony's example 
below) ?
2. That's the part of the problem. Even if you need to serialize the "String", 
you need to write length first and then need to write serialized utf bytes. We 
can implement chunked input stream and can de-serialize the object as it is 
coming (DataSerializable.fromData(ChunkedStream)). 




  From: Jacob Barrett <jbarr...@pivotal.io>
 To: dev@geode.apache.org; Hitesh Khamesra <hitesh...@yahoo.com> 
Cc: Anthony Baker <aba...@pivotal.io>
 Sent: Friday, May 5, 2017 7:29 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   

On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra <hitesh...@yahoo.com.invalid> 
wrote:

Basically, thread/layer should not hold any resources while serializing the 
object or chunk.  We should be able to see this flow (ms1-chunk1, msg2-chunk1, 
ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)


Correct, but putting that in the message layer is not appropriate. The simple 
solution is that the multiple channels can be achieved with multiple sockets. 
The later optimization is to add a channel multiplexer layer between the 
message and socket layers. 
If we put it in the message layer, not only does it for the message to tackle 
something it shouldn't be concerned with, reassembling itself, but it also 
forces all implementors to tackle this logic up front. By layering we can 
release without, implementors aren't forced into understanding the logic, and 
later we can release the layers and the client can negotiate.
 
On other pdx note: to de-serialize the pdx we need length of serialized bytes, 
so that we can read field offset from serialized stream, and then can read 
field value. Though, I can imagine with the help of pdxType, we can interpret 
serialized stream.


Yes, so today PDX serialization would be no worse, the PDX serializer would 
have to buffer, but other may not have to. The length of the buffered PDX could 
be used as the first chunk length and complete in single chunk. Although, I 
suspect that amortized overhead of splitting the chunks  will be nil anyway. 
The point is that the message encoding of values should NOT have any unbounded 
length fields and require long or many buffers to complete serialization. By 
chunking you can accomplish this by not needing to buffer the whole stream, 
just small (say 1k), chunks at a time to get the chunk length. 
Buffers == Latency
-Jake


   

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Jacob Barrett
In either case you packetize (serialize the message protocol) to buffers (fixed 
sizes and pooled) and flush buffers to the socket. Preferably using a async 
socket framework to do all the heavy lifting for you.


Sent from my iPhone

> On May 5, 2017, at 11:07 AM, Bruce Schuchardt  wrote:
> 
> Yes, of course it does but we don't serialize directly to a socket output 
> stream because it's slow.  I agree that this could be left out and added 
> later as an optimization.
> 
>> Le 5/5/2017 à 10:33 AM, Galen M O'Sullivan a écrit :
>> I think TCP does exactly this for us.
>> 
>> On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt 
>> wrote:
>> 
>>> This is very similar to how peer-to-peer messaging is performed in Geode.
>>> Messages are serialized to a stream that knows how to optimally "chunk" the
>>> bytes into fixed-size packets.  On the receiving side these are fed into a
>>> similar input stream for deserialization.  The message only contains
>>> information about the operation it represents.
>>> 
>>> Why don't we do something similar for the new client/server protocol?
>>> 
>>> 
 Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
 
 On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
 
 wrote:
 
 Basically, thread/layer should not hold any resources while serializing
> the object or chunk.  We should be able to see this flow (ms1-chunk1,
> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
> 
> Correct, but putting that in the message layer is not appropriate. The
 simple solution is that the multiple channels can be achieved with
 multiple
 sockets. The later optimization is to add a channel multiplexer layer
 between the message and socket layers.
 
 If we put it in the message layer, not only does it for the message to
 tackle something it shouldn't be concerned with, reassembling itself, but
 it also forces all implementors to tackle this logic up front. By layering
 we can release without, implementors aren't forced into understanding the
 logic, and later we can release the layers and the client can negotiate.
 
 
 
 On other pdx note: to de-serialize the pdx we need length of serialized
> bytes, so that we can read field offset from serialized stream, and then
> can read field value. Though, I can imagine with the help of pdxType, we
> can interpret serialized stream.
> 
> Yes, so today PDX serialization would be no worse, the PDX serializer
 would
 have to buffer, but other may not have to. The length of the buffered PDX
 could be used as the first chunk length and complete in single chunk.
 Although, I suspect that amortized overhead of splitting the chunks  will
 be nil anyway.
 
 The point is that the message encoding of values should NOT have any
 unbounded length fields and require long or many buffers to complete
 serialization. By chunking you can accomplish this by not needing to
 buffer
 the whole stream, just small (say 1k), chunks at a time to get the chunk
 length.
 
 Buffers == Latency
 
 -Jake
 
 
> 


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Bruce Schuchardt
Yes, of course it does but we don't serialize directly to a socket 
output stream because it's slow.  I agree that this could be left out 
and added later as an optimization.


Le 5/5/2017 à 10:33 AM, Galen M O'Sullivan a écrit :

I think TCP does exactly this for us.

On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt 
wrote:


This is very similar to how peer-to-peer messaging is performed in Geode.
Messages are serialized to a stream that knows how to optimally "chunk" the
bytes into fixed-size packets.  On the receiving side these are fed into a
similar input stream for deserialization.  The message only contains
information about the operation it represents.

Why don't we do something similar for the new client/server protocol?


Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :


On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra

wrote:

Basically, thread/layer should not hold any resources while serializing

the object or chunk.  We should be able to see this flow (ms1-chunk1,
msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)

Correct, but putting that in the message layer is not appropriate. The

simple solution is that the multiple channels can be achieved with
multiple
sockets. The later optimization is to add a channel multiplexer layer
between the message and socket layers.

If we put it in the message layer, not only does it for the message to
tackle something it shouldn't be concerned with, reassembling itself, but
it also forces all implementors to tackle this logic up front. By layering
we can release without, implementors aren't forced into understanding the
logic, and later we can release the layers and the client can negotiate.



On other pdx note: to de-serialize the pdx we need length of serialized

bytes, so that we can read field offset from serialized stream, and then
can read field value. Though, I can imagine with the help of pdxType, we
can interpret serialized stream.

Yes, so today PDX serialization would be no worse, the PDX serializer

would
have to buffer, but other may not have to. The length of the buffered PDX
could be used as the first chunk length and complete in single chunk.
Although, I suspect that amortized overhead of splitting the chunks  will
be nil anyway.

The point is that the message encoding of values should NOT have any
unbounded length fields and require long or many buffers to complete
serialization. By chunking you can accomplish this by not needing to
buffer
the whole stream, just small (say 1k), chunks at a time to get the chunk
length.

Buffers == Latency

-Jake






Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Jacob Barrett
It does! 

Both fragmenting and multiple channels as multiple sockets. 

Sent from my iPhone

> On May 5, 2017, at 10:33 AM, Galen M O'Sullivan  wrote:
> 
> I think TCP does exactly this for us.
> 
> On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt 
> wrote:
> 
>> This is very similar to how peer-to-peer messaging is performed in Geode.
>> Messages are serialized to a stream that knows how to optimally "chunk" the
>> bytes into fixed-size packets.  On the receiving side these are fed into a
>> similar input stream for deserialization.  The message only contains
>> information about the operation it represents.
>> 
>> Why don't we do something similar for the new client/server protocol?
>> 
>> 
>>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>>> 
>>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>>> 
>>> wrote:
>>> 
>>> Basically, thread/layer should not hold any resources while serializing
 the object or chunk.  We should be able to see this flow (ms1-chunk1,
 msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
 
 Correct, but putting that in the message layer is not appropriate. The
>>> simple solution is that the multiple channels can be achieved with
>>> multiple
>>> sockets. The later optimization is to add a channel multiplexer layer
>>> between the message and socket layers.
>>> 
>>> If we put it in the message layer, not only does it for the message to
>>> tackle something it shouldn't be concerned with, reassembling itself, but
>>> it also forces all implementors to tackle this logic up front. By layering
>>> we can release without, implementors aren't forced into understanding the
>>> logic, and later we can release the layers and the client can negotiate.
>>> 
>>> 
>>> 
>>> On other pdx note: to de-serialize the pdx we need length of serialized
 bytes, so that we can read field offset from serialized stream, and then
 can read field value. Though, I can imagine with the help of pdxType, we
 can interpret serialized stream.
 
 Yes, so today PDX serialization would be no worse, the PDX serializer
>>> would
>>> have to buffer, but other may not have to. The length of the buffered PDX
>>> could be used as the first chunk length and complete in single chunk.
>>> Although, I suspect that amortized overhead of splitting the chunks  will
>>> be nil anyway.
>>> 
>>> The point is that the message encoding of values should NOT have any
>>> unbounded length fields and require long or many buffers to complete
>>> serialization. By chunking you can accomplish this by not needing to
>>> buffer
>>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>>> length.
>>> 
>>> Buffers == Latency
>>> 
>>> -Jake
>>> 
>>> 
>> 


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Galen M O'Sullivan
I think TCP does exactly this for us.

On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt 
wrote:

> This is very similar to how peer-to-peer messaging is performed in Geode.
> Messages are serialized to a stream that knows how to optimally "chunk" the
> bytes into fixed-size packets.  On the receiving side these are fed into a
> similar input stream for deserialization.  The message only contains
> information about the operation it represents.
>
> Why don't we do something similar for the new client/server protocol?
>
>
> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>
>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>> 
>> wrote:
>>
>> Basically, thread/layer should not hold any resources while serializing
>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>>
>>> Correct, but putting that in the message layer is not appropriate. The
>> simple solution is that the multiple channels can be achieved with
>> multiple
>> sockets. The later optimization is to add a channel multiplexer layer
>> between the message and socket layers.
>>
>> If we put it in the message layer, not only does it for the message to
>> tackle something it shouldn't be concerned with, reassembling itself, but
>> it also forces all implementors to tackle this logic up front. By layering
>> we can release without, implementors aren't forced into understanding the
>> logic, and later we can release the layers and the client can negotiate.
>>
>>
>>
>> On other pdx note: to de-serialize the pdx we need length of serialized
>>> bytes, so that we can read field offset from serialized stream, and then
>>> can read field value. Though, I can imagine with the help of pdxType, we
>>> can interpret serialized stream.
>>>
>>> Yes, so today PDX serialization would be no worse, the PDX serializer
>> would
>> have to buffer, but other may not have to. The length of the buffered PDX
>> could be used as the first chunk length and complete in single chunk.
>> Although, I suspect that amortized overhead of splitting the chunks  will
>> be nil anyway.
>>
>> The point is that the message encoding of values should NOT have any
>> unbounded length fields and require long or many buffers to complete
>> serialization. By chunking you can accomplish this by not needing to
>> buffer
>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>> length.
>>
>> Buffers == Latency
>>
>> -Jake
>>
>>
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Jacob Barrett
I would leave it for a later optimization.

Sent from my iPhone

> On May 5, 2017, at 9:08 AM, Bruce Schuchardt  wrote:
> 
> This is very similar to how peer-to-peer messaging is performed in Geode.  
> Messages are serialized to a stream that knows how to optimally "chunk" the 
> bytes into fixed-size packets.  On the receiving side these are fed into a 
> similar input stream for deserialization.  The message only contains 
> information about the operation it represents.
> 
> Why don't we do something similar for the new client/server protocol?
> 
>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra 
>> wrote:
>> 
>>> Basically, thread/layer should not hold any resources while serializing
>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>> 
>> Correct, but putting that in the message layer is not appropriate. The
>> simple solution is that the multiple channels can be achieved with multiple
>> sockets. The later optimization is to add a channel multiplexer layer
>> between the message and socket layers.
>> 
>> If we put it in the message layer, not only does it for the message to
>> tackle something it shouldn't be concerned with, reassembling itself, but
>> it also forces all implementors to tackle this logic up front. By layering
>> we can release without, implementors aren't forced into understanding the
>> logic, and later we can release the layers and the client can negotiate.
>> 
>> 
>> 
>>> On other pdx note: to de-serialize the pdx we need length of serialized
>>> bytes, so that we can read field offset from serialized stream, and then
>>> can read field value. Though, I can imagine with the help of pdxType, we
>>> can interpret serialized stream.
>>> 
>> Yes, so today PDX serialization would be no worse, the PDX serializer would
>> have to buffer, but other may not have to. The length of the buffered PDX
>> could be used as the first chunk length and complete in single chunk.
>> Although, I suspect that amortized overhead of splitting the chunks  will
>> be nil anyway.
>> 
>> The point is that the message encoding of values should NOT have any
>> unbounded length fields and require long or many buffers to complete
>> serialization. By chunking you can accomplish this by not needing to buffer
>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>> length.
>> 
>> Buffers == Latency
>> 
>> -Jake
>> 
> 


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Bruce Schuchardt
This is very similar to how peer-to-peer messaging is performed in 
Geode.  Messages are serialized to a stream that knows how to optimally 
"chunk" the bytes into fixed-size packets.  On the receiving side these 
are fed into a similar input stream for deserialization.  The message 
only contains information about the operation it represents.


Why don't we do something similar for the new client/server protocol?

Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :

On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra 
wrote:


Basically, thread/layer should not hold any resources while serializing
the object or chunk.  We should be able to see this flow (ms1-chunk1,
msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)


Correct, but putting that in the message layer is not appropriate. The
simple solution is that the multiple channels can be achieved with multiple
sockets. The later optimization is to add a channel multiplexer layer
between the message and socket layers.

If we put it in the message layer, not only does it for the message to
tackle something it shouldn't be concerned with, reassembling itself, but
it also forces all implementors to tackle this logic up front. By layering
we can release without, implementors aren't forced into understanding the
logic, and later we can release the layers and the client can negotiate.




On other pdx note: to de-serialize the pdx we need length of serialized
bytes, so that we can read field offset from serialized stream, and then
can read field value. Though, I can imagine with the help of pdxType, we
can interpret serialized stream.


Yes, so today PDX serialization would be no worse, the PDX serializer would
have to buffer, but other may not have to. The length of the buffered PDX
could be used as the first chunk length and complete in single chunk.
Although, I suspect that amortized overhead of splitting the chunks  will
be nil anyway.

The point is that the message encoding of values should NOT have any
unbounded length fields and require long or many buffers to complete
serialization. By chunking you can accomplish this by not needing to buffer
the whole stream, just small (say 1k), chunks at a time to get the chunk
length.

Buffers == Latency

-Jake





Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-05 Thread Jacob Barrett
On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra 
wrote:

> Basically, thread/layer should not hold any resources while serializing
> the object or chunk.  We should be able to see this flow (ms1-chunk1,
> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>

Correct, but putting that in the message layer is not appropriate. The
simple solution is that the multiple channels can be achieved with multiple
sockets. The later optimization is to add a channel multiplexer layer
between the message and socket layers.

If we put it in the message layer, not only does it for the message to
tackle something it shouldn't be concerned with, reassembling itself, but
it also forces all implementors to tackle this logic up front. By layering
we can release without, implementors aren't forced into understanding the
logic, and later we can release the layers and the client can negotiate.



> On other pdx note: to de-serialize the pdx we need length of serialized
> bytes, so that we can read field offset from serialized stream, and then
> can read field value. Though, I can imagine with the help of pdxType, we
> can interpret serialized stream.
>

Yes, so today PDX serialization would be no worse, the PDX serializer would
have to buffer, but other may not have to. The length of the buffered PDX
could be used as the first chunk length and complete in single chunk.
Although, I suspect that amortized overhead of splitting the chunks  will
be nil anyway.

The point is that the message encoding of values should NOT have any
unbounded length fields and require long or many buffers to complete
serialization. By chunking you can accomplish this by not needing to buffer
the whole stream, just small (say 1k), chunks at a time to get the chunk
length.

Buffers == Latency

-Jake


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread William Markito Oliveira
+1 for that as well
On Thu, May 4, 2017 at 5:21 PM Dan Smith  wrote:

> >
> > I wouldn't tackle that at this layer. I would suggest adding a layer
> > between the message and TCP that creates channels that are opaque to the
> > message layer above. The message layer wouldn't know if it was talking to
> > multiple sockets to the client or single socket with multiple channels.
> >
>
> ++1 on that!
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Dan Smith
>
> I wouldn't tackle that at this layer. I would suggest adding a layer
> between the message and TCP that creates channels that are opaque to the
> message layer above. The message layer wouldn't know if it was talking to
> multiple sockets to the client or single socket with multiple channels.
>

++1 on that!


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Hitesh Khamesra
>>> a. Now these two chunks will go continuous. 

>>They would appear continuous to the object serialization layer.

One benefit of messageHeader with chunk is that, it gives us ability to write 
different messages(multiplexing) on same socket. And if thread is ready it can 
write its message. Though we are not there yet, but that will require for 
single socket async architecture.





From: Jacob Barrett <jbarr...@pivotal.io>
To: dev@geode.apache.org 
Cc: Anthony Baker <aba...@pivotal.io>
Sent: Thursday, May 4, 2017 12:48 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal





Sent from my iPhone

> On May 4, 2017, at 12:03 PM, Hitesh Khamesra <hitesh...@yahoo.com.INVALID> 
> wrote:
> 
> And len 0 would indicate end of the message? 
> 
> 
> a. Now these two chunks will go continuous. 

They would appear continuous to the object serialization layer.



> 
> 
> b. If its PDX encoded then pdx header(1byte:pdxid 4byte:len 4byte:typeId) 
> requires size of all pdx serialized bytes. So we know "size" of data upfront 
> here. 

We could define the InputSource for the Value part such that 
InputSource.getLength() could return a known length or -1 if length is unknown. 
If length is reasonable then the object could be encoded with a single chuck of 
size InputSource.getLength() followed by a 0 chunk. 

Clients are likely dealing with domain objects where the serialize length is 
not known until serialization is complete. This would require buffering to get 
the length. Buffering adds heap pressure and latency.

-Jake


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Hitesh Khamesra
And len 0 would indicate end of the message? 


a. Now these two chunks will go continuous. 


b. If its PDX encoded then pdx header(1byte:pdxid 4byte:len 4byte:typeId) 
requires size of all pdx serialized bytes. So we know "size" of data upfront 
here. 


c. Lets say region value is just long byte[]:  Then we have "size" to send the 
message.


So in both cases we know the "size" of serialized bytes(payload). So possibly 
we don't need to chunk that message and let tcp take care of it?


It seems we should walk through some more usecases to understand this better.



Thanks.
Hitesh




From: Anthony Baker <aba...@pivotal.io>
To: Hitesh Khamesra <hitesh...@yahoo.com> 
Cc: "dev@geode.apache.org" <dev@geode.apache.org>
Sent: Thursday, May 4, 2017 11:20 AM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal



There would be one Message containing a single MessageHeader and a single 
MessageBody.  A PDX EncodedValue containing 1242 bytes that are chunked would 
look something like this:

PDX 1000 byte[1000] 242 byte[242] 0


Anthony



> On May 4, 2017, at 10:38 AM, Hitesh Khamesra <hitesh...@yahoo.com> wrote:
> 
> Hi Anthony:
> 
> Help me to understand data chunking here?
> 
>>> bytes => arbitrary byte[] that can be chunked
> 
> Message => MessageHeader MessageBody
> 
> So lets say we want to send long byte[] into two chunks, then we will send 
> two messages? And other side will combine those two messages using 
> "correlationId" ?
> 
> Thanks.
> HItesh
> 
> 
> 
> 
> 
> From: Anthony Baker <aba...@pivotal.io>
> To: dev@geode.apache.org 
> Cc: Hitesh Khamesra <hitesh...@yahoo.com>
> Sent: Wednesday, May 3, 2017 5:42 PM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
> 
> 
> 
> 
>> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <gosulli...@pivotal.io> wrote:
>> 
>> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
>> hitesh...@yahoo.com.invalid> wrote:
>> 
>>> Absolutely its a implementation detail.
>>> 
>> This doesn't answer Dan's comment. Do you think fragmentation should be
>> taken care of by the TCP layer or the protocol should deal with it
>> specifically?
> 
> There’s some really good feedback and discussion in this thread!  Here are a 
> few thoughts:
> 
> 1) Optional metadata should be used for fields that are generally applicable 
> across all messages.  If a metadata field is required or only applies to a 
> small set of messages, it should become part of a message definition.  Of 
> course there’s some grey area here.
> 
> 2) I think we should pull out the message fragmentation support to avoid some 
> significant complexity.  We can later add a fragmentation / envelope layer on 
> top without disrupting the current proposal.  I do think we should add the 
> capability for chunking data (more on that below).
> 
> 3) I did not find any discussion of message pipelining (queuing multiple 
> requests on a socket without waiting for a response) or out-of-order 
> responses.  What is the plan for these capabilities and how will that affect 
> consistency?  What about retries?
> 
> 4) Following is an alternative definition with these characteristics:
> 
> - Serialized data can either be primitive or encoded values.  Encoded values 
> are chunked as needed to break up large objects into a series of smaller 
> parts.
> - Because values can be chunked, the size field is removed.  This allows the 
> message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type with a 
> new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able to 
> understand.
> - The metadata map is not used for fields that belong in the message body.  I 
> think it’s much easier to write a spec without if statements :-)
> 
> Message => MessageHeader MessageBody
> 
> MessageHeader => correlationId metadata
>correlationId => integer
>metadata => count (key value)*
>count => integer
>key => string
>value => string
> 
> MessageBody => apiId body
>apiId => integer
>body => (see specific definitions)
> 
> GetRequest => 0 acceptEncoding key
>0 => the API id
>acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>key => EncodedValue
> 
> GetResponse => 1 value
>1 => the API id
>value => EncodedValue
> 
> PutRequest => 2 eventId key value
>2 => the API id
>  

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Anthony Baker
There would be one Message containing a single MessageHeader and a single 
MessageBody.  A PDX EncodedValue containing 1242 bytes that are chunked would 
look something like this:

PDX 1000 byte[1000] 242 byte[242] 0


Anthony


> On May 4, 2017, at 10:38 AM, Hitesh Khamesra <hitesh...@yahoo.com> wrote:
> 
> Hi Anthony:
> 
> Help me to understand data chunking here?
> 
>>> bytes => arbitrary byte[] that can be chunked
> 
> Message => MessageHeader MessageBody
> 
> So lets say we want to send long byte[] into two chunks, then we will send 
> two messages? And other side will combine those two messages using 
> "correlationId" ?
> 
> Thanks.
> HItesh
> 
> 
> 
> 
> 
> From: Anthony Baker <aba...@pivotal.io>
> To: dev@geode.apache.org 
> Cc: Hitesh Khamesra <hitesh...@yahoo.com>
> Sent: Wednesday, May 3, 2017 5:42 PM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
> 
> 
> 
> 
>> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <gosulli...@pivotal.io> wrote:
>> 
>> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
>> hitesh...@yahoo.com.invalid> wrote:
>> 
>>> Absolutely its a implementation detail.
>>> 
>> This doesn't answer Dan's comment. Do you think fragmentation should be
>> taken care of by the TCP layer or the protocol should deal with it
>> specifically?
> 
> There’s some really good feedback and discussion in this thread!  Here are a 
> few thoughts:
> 
> 1) Optional metadata should be used for fields that are generally applicable 
> across all messages.  If a metadata field is required or only applies to a 
> small set of messages, it should become part of a message definition.  Of 
> course there’s some grey area here.
> 
> 2) I think we should pull out the message fragmentation support to avoid some 
> significant complexity.  We can later add a fragmentation / envelope layer on 
> top without disrupting the current proposal.  I do think we should add the 
> capability for chunking data (more on that below).
> 
> 3) I did not find any discussion of message pipelining (queuing multiple 
> requests on a socket without waiting for a response) or out-of-order 
> responses.  What is the plan for these capabilities and how will that affect 
> consistency?  What about retries?
> 
> 4) Following is an alternative definition with these characteristics:
> 
> - Serialized data can either be primitive or encoded values.  Encoded values 
> are chunked as needed to break up large objects into a series of smaller 
> parts.
> - Because values can be chunked, the size field is removed.  This allows the 
> message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type with a 
> new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able to 
> understand.
> - The metadata map is not used for fields that belong in the message body.  I 
> think it’s much easier to write a spec without if statements :-)
> 
> Message => MessageHeader MessageBody
> 
> MessageHeader => correlationId metadata
>correlationId => integer
>metadata => count (key value)*
>count => integer
>key => string
>value => string
> 
> MessageBody => apiId body
>apiId => integer
>body => (see specific definitions)
> 
> GetRequest => 0 acceptEncoding key
>0 => the API id
>acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>key => EncodedValue
> 
> GetResponse => 1 value
>1 => the API id
>value => EncodedValue
> 
> PutRequest => 2 eventId key value
>2 => the API id
>eventId => clientId threadId sequenceId
>clientId => string
>threadId => integer
>sequenceId => integer
>key => EncodedValue
>value => EncodedValue
> 
> EncodedValue => encoding (boolean | integer | number | string | ((length 
> bytes)* 0))
>encoding => (define some encodings for byte[], JSON, PDX, *, etc)
>boolean => TRUE or FALSE
>integer => a signed integer value
>number => a decimal value corresponding to IEEE 754
>string => UTF-8 text
>bytes => arbitrary byte[] that can be chunked
> 
> 
> Anthony



Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Hitesh Khamesra
Hi Anthony:

Help me to understand data chunking here?

>> bytes => arbitrary byte[] that can be chunked

Message => MessageHeader MessageBody

So lets say we want to send long byte[] into two chunks, then we will send two 
messages? And other side will combine those two messages using "correlationId" ?

Thanks.
HItesh





From: Anthony Baker <aba...@pivotal.io>
To: dev@geode.apache.org 
Cc: Hitesh Khamesra <hitesh...@yahoo.com>
Sent: Wednesday, May 3, 2017 5:42 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal




> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <gosulli...@pivotal.io> wrote:
> 
> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hitesh...@yahoo.com.invalid> wrote:
> 
>> Absolutely its a implementation detail.
>> 
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?

There’s some really good feedback and discussion in this thread!  Here are a 
few thoughts:

1) Optional metadata should be used for fields that are generally applicable 
across all messages.  If a metadata field is required or only applies to a 
small set of messages, it should become part of a message definition.  Of 
course there’s some grey area here.

2) I think we should pull out the message fragmentation support to avoid some 
significant complexity.  We can later add a fragmentation / envelope layer on 
top without disrupting the current proposal.  I do think we should add the 
capability for chunking data (more on that below).

3) I did not find any discussion of message pipelining (queuing multiple 
requests on a socket without waiting for a response) or out-of-order responses. 
 What is the plan for these capabilities and how will that affect consistency?  
What about retries?

4) Following is an alternative definition with these characteristics:

- Serialized data can either be primitive or encoded values.  Encoded values 
are chunked as needed to break up large objects into a series of smaller parts.
- Because values can be chunked, the size field is removed.  This allows the 
message to be streamed to the socket incrementally.
- The apiVersion is removed because we can just define a new body type with a 
new apiId (e.g. GetRequest2 with aipId = 1292).
- The GetRequest tells the server what kind of encoding the client is able to 
understand.
- The metadata map is not used for fields that belong in the message body.  I 
think it’s much easier to write a spec without if statements :-)

Message => MessageHeader MessageBody

MessageHeader => correlationId metadata
correlationId => integer
metadata => count (key value)*
count => integer
key => string
value => string

MessageBody => apiId body
apiId => integer
body => (see specific definitions)

GetRequest => 0 acceptEncoding key
0 => the API id
acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
key => EncodedValue

GetResponse => 1 value
1 => the API id
value => EncodedValue

PutRequest => 2 eventId key value
2 => the API id
eventId => clientId threadId sequenceId
clientId => string
threadId => integer
sequenceId => integer
key => EncodedValue
value => EncodedValue

EncodedValue => encoding (boolean | integer | number | string | ((length 
bytes)* 0))
encoding => (define some encodings for byte[], JSON, PDX, *, etc)
boolean => TRUE or FALSE
integer => a signed integer value
number => a decimal value corresponding to IEEE 754
string => UTF-8 text
bytes => arbitrary byte[] that can be chunked


Anthony


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Bruce Schuchardt

It's becoming clear that the document needs a section on EventIds.

EventIds aren't opaque to the server.  They are comparable objects and 
are used by the cache to prevent replay of older (operation eventId < 
recorded eventId) operations on the cache and on subscription queues.  
They are also used to prevent sending operations originating from a 
client back to that client in its subscription queue.


A thread's sequenceId should be incremented for each operation sent to 
the server.


In my opinion EventIds are optional for clients and only need to be 
implemented if clients are going to retry operations.  If a client 
doesn't send an EventId to the server one will be generated on the 
server for the operation.


Le 5/4/2017 à 8:46 AM, Jacob Barrett a écrit :

The eventId is really just a once token right? Meaning that its rather
opaque to the server and intended to keep the server from replaying a
request that the client may have retried that was actually successful. If
it is opaque to the server then why encode all these specific identifiers?
Seems to me it could be optional for one and could simply be a variant int
or byte[]. The server just needs to stash the once tokens and make sure it
doesn't get a duplicate on this client stream.




Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-04 Thread Jacob Barrett
+1

On Wed, May 3, 2017 at 5:43 PM Anthony Baker  wrote:

>
> 2) I think we should pull out the message fragmentation support to avoid
> some significant complexity.  We can later add a fragmentation / envelope
> layer on top without disrupting the current proposal.  I do think we should
> add the capability for chunking data (more on that below).
>

+10 Like any good engineering practice we need to keep objects well
encapsulated and focused on their singular task. A message would not be
represented as a object of "partials" but as a whole message object, so why
treat it any differently when serialized. The layer below it can chunk it
if necessary. Initially between the message (lowest level in our stack) and
the TCP socket is nothing. TCP will fragment as needed, full message is
delivered up the stack. If in the future we ant to
mulitchannel/interleave/pipeline (whatever you want to call it) we can
negotiate the support with the client and inject a layer between the
message and TCP layers that identifies unique streams of data channels. In
the interim, the naive approach to multiple channels is to open a second
socket. The important thing is that at the message layer it doesn't know
and doesn't care.


> 4) Following is an alternative definition with these characteristics:
>
> - Serialized data can either be primitive or encoded values.  Encoded
> values are chunked as needed to break up large objects into a series of
> smaller parts.
>
+1

> - Because values can be chunked, the size field is removed.  This allows
> the message to be streamed to the socket incrementally.
>
+1

> - The apiVersion is removed because we can just define a new body type
> with a new apiId (e.g. GetRequest2 with aipId = 1292).
>
+1 Think of the message as a class, you don't want to have class that has
more than a single personality. If the first argument to your class
(version) is the personality then you need to think about a new class. You
don't want the writer of the protocol to have to deduce the personality of
the object based on an argument and then have to decide which fields are
require or optional or obsolete. By making a new message you strongly type
the messages both in definition and in implementation.


> - The GetRequest tells the server what kind of encoding the client is able
> to understand.
>
+ 1 I would suggest that a default ordered list be established at initial
handshake. If a list is not provided at handshake then ALL are supported.
Then on individual request messages if the list of encodings is given then
it overrides the list allowed for that single request. If no list is
provided on the request the handshake negotiated list is assumed. If a
value being returned is not encoded in any of the encodings listed then it
is transcoded to the highest priority encoding with an available transcoder
between the source and destination encoding.

GetRequest => 0 acceptEncoding key
> 0 => the API id
> acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
> key => EncodedValue
>
Change: acceptedEncodings => encodingId*
Would it make sense to make 'key' a 'key+' or does a GetAllRequest and
GetAllResponse vary that much from GetRequest and GetResponse?

PutRequest => 2 eventId key value
> 2 => the API id
> eventId => clientId threadId sequenceId
> clientId => string
> threadId => integer
> sequenceId => integer
> key => EncodedValue
> value => EncodedValue
>

The eventId is really just a once token right? Meaning that its rather
opaque to the server and intended to keep the server from replaying a
request that the client may have retried that was actually successful. If
it is opaque to the server then why encode all these specific identifiers?
Seems to me it could be optional for one and could simply be a variant int
or byte[]. The server just needs to stash the once tokens and make sure it
doesn't get a duplicate on this client stream.

-Jake


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Dan Smith
+1 to what Anthony has laid out! I think this is a better way to handle
value encodings, and it's also better to be putting message specific
details like event id with those messages.

I do wonder whether this proposal actually needs metadata headers at all?
What will eventually go in there?

-Dan

There’s some really good feedback and discussion in this thread!  Here are
> a few thoughts:
>
> 1) Optional metadata should be used for fields that are generally
> applicable across all messages.  If a metadata field is required or only
> applies to a small set of messages, it should become part of a message
> definition.  Of course there’s some grey area here.
>
> 2) I think we should pull out the message fragmentation support to avoid
> some significant complexity.  We can later add a fragmentation / envelope
> layer on top without disrupting the current proposal.  I do think we should
> add the capability for chunking data (more on that below).
>
> 3) I did not find any discussion of message pipelining (queuing multiple
> requests on a socket without waiting for a response) or out-of-order
> responses.  What is the plan for these capabilities and how will that
> affect consistency?  What about retries?
>
> 4) Following is an alternative definition with these characteristics:
>
> - Serialized data can either be primitive or encoded values.  Encoded
> values are chunked as needed to break up large objects into a series of
> smaller parts.
> - Because values can be chunked, the size field is removed.  This allows
> the message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type
> with a new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able
> to understand.
> - The metadata map is not used for fields that belong in the message
> body.  I think it’s much easier to write a spec without if statements :-)
>
> Message => MessageHeader MessageBody
>
> MessageHeader => correlationId metadata
> correlationId => integer
> metadata => count (key value)*
> count => integer
> key => string
> value => string
>
> MessageBody => apiId body
> apiId => integer
> body => (see specific definitions)
>
> GetRequest => 0 acceptEncoding key
> 0 => the API id
> acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
> key => EncodedValue
>
> GetResponse => 1 value
> 1 => the API id
> value => EncodedValue
>
> PutRequest => 2 eventId key value
> 2 => the API id
> eventId => clientId threadId sequenceId
> clientId => string
> threadId => integer
> sequenceId => integer
> key => EncodedValue
> value => EncodedValue
>
> EncodedValue => encoding (boolean | integer | number | string | ((length
> bytes)* 0))
> encoding => (define some encodings for byte[], JSON, PDX, *, etc)
> boolean => TRUE or FALSE
> integer => a signed integer value
> number => a decimal value corresponding to IEEE 754
> string => UTF-8 text
> bytes => arbitrary byte[] that can be chunked
>
>
> Anthony
>
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Hitesh Khamesra
Good point Dan !! that needs to document.


  From: Dan Smith <dsm...@pivotal.io>
 To: dev@geode.apache.org 
 Sent: Wednesday, May 3, 2017 5:31 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
Okay  but how do I has an implementer of a driver know what messages
need an event id and which don't? It seems like maybe this belongs with
those message types, rather than in a generic header. Or maybe you need to
start organizing messages into classes - eg messages that change state and
messages that don't and abstracting out commonality.

It's also not clear exactly what the event id should be set to. When do a
change the sequence id? Does it have to be monotonically increasing? What
should the uniqueId be?

-Da

On Wed, May 3, 2017 at 5:07 PM, Udo Kohlmeyer <ukohlme...@pivotal.io> wrote:

> Correct,
>
> I did miss that. @Dan, if you look at https://cwiki.apache.org/confl
> uence/display/GEODE/Message+Structure+and+Definition#Messa
> geStructureandDefinition-MetaDataforRequests specifies how we provide
> EventId information.
>
>
>
> On 5/3/17 09:53, Bruce Schuchardt wrote:
>
>> I believe Hitesh put EventId in the metadata section.
>>
>> Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
>>
>>> We are considering the function service, but again, this should not
>>> detract from the proposed message specification proposal.
>>>
>>> You are also correct in your observation of list of error codes not
>>> being complete nor exhaustive. Maybe the first page needs to highlight that
>>> this is a proposal and does not contain all the error codes that we could
>>> per api.
>>>
>>> As for the EventId, we will look into this and update the document
>>> accordingly.
>>>
>>> --Udo
>>>
>>>
>>> On 5/2/17 13:42, Dan Smith wrote:
>>>
>>>> I guess the value of building other messages on top of the function
>>>> service mostly comes into play when we start talking about smarter clients
>>>> that can do single hop. At that point it's really nice to have have a layer
>>>> that lets us send a message to a single primary, or all of the members that
>>>> host a region etc. It is also nice that right now if I add new function
>>>> that functionality becomes available to gfsh, REST, Java, and C++
>>>> developers automatically.
>>>>
>>>> I do agree that the new protocol could build in these concepts, and
>>>> doesn't necessarily have to use function execution to achieve the same
>>>> results. But do at least consider whether new developers will want to add
>>>> new functionality to the server via functions or via your this new
>>>> protocol. If it's harder to use the new protocol than to write a new
>>>> function and invoke it from the client, then I think we've done something
>>>> wrong.
>>>>
>>>>
>>>> A couple of other comments, now that I've looked a little more:
>>>>
>>>> 1) The list of error codes <https://cwiki.apache.org/conf
>>>> luence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> seems
>>>> really incomplete. It looks like we've picked a few of the possible
>>>> exceptions geode could throw and assigned them integer ids? What is the
>>>> rational for the exceptions that are included here vs. other exceptions?
>>>> Also, not all messages would need to return these error codes.
>>>>
>>>> 2) The existing protocol has some functionality even for basic puts
>>>> that is not represented here. Client generate an event id that is
>>>> associated with the put and send that to the server. These event ids are
>>>> used to guarantee that if a client does put (A, 0) followed by put (A, 1),
>>>> the resulting value will always be 1, even if the client timed out and
>>>> retried put (A, 0). The event id prevents the lingered put that timed out
>>>> on the server from affecting the state. I'm not saying the new protocol has
>>>> to support this sort of behavior, but you might want to consider whether
>>>> the current protocol should specify anything about how events are retried.
>>>>
>>>> -Dan
>>>>
>>>
>>>
>>>
>>
>

   

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Anthony Baker

> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan  wrote:
> 
> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hitesh...@yahoo.com.invalid> wrote:
> 
>> Absolutely its a implementation detail.
>> 
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?

There’s some really good feedback and discussion in this thread!  Here are a 
few thoughts:

1) Optional metadata should be used for fields that are generally applicable 
across all messages.  If a metadata field is required or only applies to a 
small set of messages, it should become part of a message definition.  Of 
course there’s some grey area here.

2) I think we should pull out the message fragmentation support to avoid some 
significant complexity.  We can later add a fragmentation / envelope layer on 
top without disrupting the current proposal.  I do think we should add the 
capability for chunking data (more on that below).

3) I did not find any discussion of message pipelining (queuing multiple 
requests on a socket without waiting for a response) or out-of-order responses. 
 What is the plan for these capabilities and how will that affect consistency?  
What about retries?

4) Following is an alternative definition with these characteristics:

- Serialized data can either be primitive or encoded values.  Encoded values 
are chunked as needed to break up large objects into a series of smaller parts.
- Because values can be chunked, the size field is removed.  This allows the 
message to be streamed to the socket incrementally.
- The apiVersion is removed because we can just define a new body type with a 
new apiId (e.g. GetRequest2 with aipId = 1292).
- The GetRequest tells the server what kind of encoding the client is able to 
understand.
- The metadata map is not used for fields that belong in the message body.  I 
think it’s much easier to write a spec without if statements :-)

Message => MessageHeader MessageBody

MessageHeader => correlationId metadata
correlationId => integer
metadata => count (key value)*
count => integer
key => string
value => string

MessageBody => apiId body
apiId => integer
body => (see specific definitions)

GetRequest => 0 acceptEncoding key
0 => the API id
acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
key => EncodedValue

GetResponse => 1 value
1 => the API id
value => EncodedValue

PutRequest => 2 eventId key value
2 => the API id
eventId => clientId threadId sequenceId
clientId => string
threadId => integer
sequenceId => integer
key => EncodedValue
value => EncodedValue

EncodedValue => encoding (boolean | integer | number | string | ((length 
bytes)* 0))
encoding => (define some encodings for byte[], JSON, PDX, *, etc)
boolean => TRUE or FALSE
integer => a signed integer value
number => a decimal value corresponding to IEEE 754
string => UTF-8 text
bytes => arbitrary byte[] that can be chunked


Anthony



Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Dan Smith
Okay  but how do I has an implementer of a driver know what messages
need an event id and which don't? It seems like maybe this belongs with
those message types, rather than in a generic header. Or maybe you need to
start organizing messages into classes - eg messages that change state and
messages that don't and abstracting out commonality.

It's also not clear exactly what the event id should be set to. When do a
change the sequence id? Does it have to be monotonically increasing? What
should the uniqueId be?

-Da

On Wed, May 3, 2017 at 5:07 PM, Udo Kohlmeyer  wrote:

> Correct,
>
> I did miss that. @Dan, if you look at https://cwiki.apache.org/confl
> uence/display/GEODE/Message+Structure+and+Definition#Messa
> geStructureandDefinition-MetaDataforRequests specifies how we provide
> EventId information.
>
>
>
> On 5/3/17 09:53, Bruce Schuchardt wrote:
>
>> I believe Hitesh put EventId in the metadata section.
>>
>> Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
>>
>>> We are considering the function service, but again, this should not
>>> detract from the proposed message specification proposal.
>>>
>>> You are also correct in your observation of list of error codes not
>>> being complete nor exhaustive. Maybe the first page needs to highlight that
>>> this is a proposal and does not contain all the error codes that we could
>>> per api.
>>>
>>> As for the EventId, we will look into this and update the document
>>> accordingly.
>>>
>>> --Udo
>>>
>>>
>>> On 5/2/17 13:42, Dan Smith wrote:
>>>
 I guess the value of building other messages on top of the function
 service mostly comes into play when we start talking about smarter clients
 that can do single hop. At that point it's really nice to have have a layer
 that lets us send a message to a single primary, or all of the members that
 host a region etc. It is also nice that right now if I add new function
 that functionality becomes available to gfsh, REST, Java, and C++
 developers automatically.

 I do agree that the new protocol could build in these concepts, and
 doesn't necessarily have to use function execution to achieve the same
 results. But do at least consider whether new developers will want to add
 new functionality to the server via functions or via your this new
 protocol. If it's harder to use the new protocol than to write a new
 function and invoke it from the client, then I think we've done something
 wrong.


 A couple of other comments, now that I've looked a little more:

 1) The list of error codes  seems
 really incomplete. It looks like we've picked a few of the possible
 exceptions geode could throw and assigned them integer ids? What is the
 rational for the exceptions that are included here vs. other exceptions?
 Also, not all messages would need to return these error codes.

 2) The existing protocol has some functionality even for basic puts
 that is not represented here. Client generate an event id that is
 associated with the put and send that to the server. These event ids are
 used to guarantee that if a client does put (A, 0) followed by put (A, 1),
 the resulting value will always be 1, even if the client timed out and
 retried put (A, 0). The event id prevents the lingered put that timed out
 on the server from affecting the state. I'm not saying the new protocol has
 to support this sort of behavior, but you might want to consider whether
 the current protocol should specify anything about how events are retried.

 -Dan

>>>
>>>
>>>
>>
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Udo Kohlmeyer

Correct,

I did miss that. @Dan, if you look at 
https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-MetaDataforRequests 
specifies how we provide EventId information.



On 5/3/17 09:53, Bruce Schuchardt wrote:

I believe Hitesh put EventId in the metadata section.

Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
We are considering the function service, but again, this should not 
detract from the proposed message specification proposal.


You are also correct in your observation of list of error codes not 
being complete nor exhaustive. Maybe the first page needs to 
highlight that this is a proposal and does not contain all the error 
codes that we could per api.


As for the EventId, we will look into this and update the document 
accordingly.


--Udo


On 5/2/17 13:42, Dan Smith wrote:
I guess the value of building other messages on top of the function 
service mostly comes into play when we start talking about smarter 
clients that can do single hop. At that point it's really nice to 
have have a layer that lets us send a message to a single primary, 
or all of the members that host a region etc. It is also nice that 
right now if I add new function that functionality becomes available 
to gfsh, REST, Java, and C++ developers automatically.


I do agree that the new protocol could build in these concepts, and 
doesn't necessarily have to use function execution to achieve the 
same results. But do at least consider whether new developers will 
want to add new functionality to the server via functions or via 
your this new protocol. If it's harder to use the new protocol than 
to write a new function and invoke it from the client, then I think 
we've done something wrong.



A couple of other comments, now that I've looked a little more:

1) The list of error codes 
 
seems really incomplete. It looks like we've picked a few of the 
possible exceptions geode could throw and assigned them integer ids? 
What is the rational for the exceptions that are included here vs. 
other exceptions? Also, not all messages would need to return these 
error codes.


2) The existing protocol has some functionality even for basic puts 
that is not represented here. Client generate an event id that is 
associated with the put and send that to the server. These event ids 
are used to guarantee that if a client does put (A, 0) followed by 
put (A, 1), the resulting value will always be 1, even if the client 
timed out and retried put (A, 0). The event id prevents the lingered 
put that timed out on the server from affecting the state. I'm not 
saying the new protocol has to support this sort of behavior, but 
you might want to consider whether the current protocol should 
specify anything about how events are retried.


-Dan









Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Bruce Schuchardt
I agree with Dan that the spec will need to deal with the effects of 
retrying an operation.


Le 5/3/2017 à 10:58 AM, Hitesh Khamesra a écrit :

(Sorry: one more attempt to format this message)

Here are the few things we need to consider..


1. key, value, callbackarg can be required to interpret as JSON-to-pdx.
2. client calls "get/getall" api and want return value as JSON. Value was 
serialized as pdx.
3. This behavior should be optional, if possible no overhead for others.
4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about 
this. And may be worth to consider it.

Thus my initial thought was client should indicate this feature at message 
level(metadata), saying, convert pdx value to json or vice-versa.


Any thoughts?
Thanks.
HItesh




From: Hitesh Khamesra <hitesh...@yahoo.com.INVALID>
To: "dev@geode.apache.org" <dev@geode.apache.org>
Sent: Wednesday, May 3, 2017 10:01 AM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal



Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls 
"get/getall" api and want return value as JSON. Value was serialized as pdx.3. This 
behavior should be optional, if possible no overhead for others.4. "putAll api" can have 
mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message 
level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh


   From: Jacob Barrett <jbarr...@pivotal.io>

To: dev@geode.apache.org
Cc: Udo Kohlmeyer <ukohlme...@pivotal.io>
Sent: Tuesday, May 2, 2017 8:11 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.


Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone


On May 2, 2017, at 1:42 PM, Dan Smith <dsm...@pivotal.io> wrote:

I guess the value of building other messages on top of the function service
mostly comes into play when we start talking about smarter clients that can
do single hop. At that point it's really nice to have have a layer that
lets us send a message to a single primary, or all of the members that host
a region etc. It is also nice that right now if I add new function that
functionality becomes available to gfsh, REST, Java, and C++ developers
automatically.

I do agree that the new protocol could build in these concepts, and doesn't
necessarily have to use function execution to achieve the same results. But
do at least consider whether new developers will want to add new
functionality to the server via functions or via your this new protocol. If
it's harder to use the new protocol than to write a new function and invoke
it from the client, then I think we've done something wrong.


A couple of other comments, now that I've looked a little more:

1) The list of error codes
<https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
seems really incomplete. It looks like we've picked a few of the possible
exceptions geode could throw and assigned them integer ids? What is the
rational for the exceptions that are included here vs. other exceptions?
Also, not all messages would need to return these error codes.

2) The existing protocol has some functionality even for basic puts that is
not represented here. Client generate an event id that is associated with
the put and send that to the server. These event ids are used to guarantee
that if a client does put (A, 0) followed by put (A, 1), the resulting
value will always be 1, even if the client timed out and retried put (A,
0). The event id prevents the lingered put that timed out on the server
from affecting the state. I'm not saying the new protocol has to support
this sort of behavior, but you might want to consider whether the current
protocol should specify anything about how events are retried.

-Dan




Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Michael Stolz
The TCP fragmentation is fine for what it is, but it is *not* paging, and
paging has long been something that we have wanted to get around to.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Wed, May 3, 2017 at 1:33 PM, Galen M O'Sullivan 
wrote:

> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hitesh...@yahoo.com.invalid> wrote:
>
> > Absolutely its a implementation detail.
> >
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Galen M O'Sullivan
On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
hitesh...@yahoo.com.invalid> wrote:

> Absolutely its a implementation detail.
>
This doesn't answer Dan's comment. Do you think fragmentation should be
taken care of by the TCP layer or the protocol should deal with it
specifically?


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Hitesh Khamesra
(Sorry: one more attempt to format this message)

Here are the few things we need to consider..


1. key, value, callbackarg can be required to interpret as JSON-to-pdx.
2. client calls "get/getall" api and want return value as JSON. Value was 
serialized as pdx.
3. This behavior should be optional, if possible no overhead for others.
4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about 
this. And may be worth to consider it.

Thus my initial thought was client should indicate this feature at message 
level(metadata), saying, convert pdx value to json or vice-versa.


Any thoughts?
Thanks.
HItesh




From: Hitesh Khamesra <hitesh...@yahoo.com.INVALID>
To: "dev@geode.apache.org" <dev@geode.apache.org> 
Sent: Wednesday, May 3, 2017 10:01 AM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal



Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client 
calls "get/getall" api and want return value as JSON. Value was serialized as 
pdx.3. This behavior should be optional, if possible no overhead for others.4. 
"putAll api" can have mixed type values(JSON and numbers). Dan raised about 
this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message 
level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh


  From: Jacob Barrett <jbarr...@pivotal.io>

To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <ukohlme...@pivotal.io>
Sent: Tuesday, May 2, 2017 8:11 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
  
I agree completely with Dan. There is no reason to have flags for value 
encoding type in the message. I would argue that should be part of the value 
serialization layer. If something was placed in the message layer it should be 
more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <dsm...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Hitesh Khamesra
Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client 
calls "get/getall" api and want return value as JSON. Value was serialized as 
pdx.3. This behavior should be optional, if possible no overhead for others.4. 
"putAll api" can have mixed type values(JSON and numbers). Dan raised about 
this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message 
level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh


  From: Jacob Barrett <jbarr...@pivotal.io>
 To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <ukohlme...@pivotal.io>
 Sent: Tuesday, May 2, 2017 8:11 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I agree completely with Dan. There is no reason to have flags for value 
encoding type in the message. I would argue that should be part of the value 
serialization layer. If something was placed in the message layer it should be 
more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <dsm...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan

   

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Bruce Schuchardt

I believe Hitesh put EventId in the metadata section.

Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
We are considering the function service, but again, this should not 
detract from the proposed message specification proposal.


You are also correct in your observation of list of error codes not 
being complete nor exhaustive. Maybe the first page needs to highlight 
that this is a proposal and does not contain all the error codes that 
we could per api.


As for the EventId, we will look into this and update the document 
accordingly.


--Udo


On 5/2/17 13:42, Dan Smith wrote:
I guess the value of building other messages on top of the function 
service mostly comes into play when we start talking about smarter 
clients that can do single hop. At that point it's really nice to 
have have a layer that lets us send a message to a single primary, or 
all of the members that host a region etc. It is also nice that right 
now if I add new function that functionality becomes available to 
gfsh, REST, Java, and C++ developers automatically.


I do agree that the new protocol could build in these concepts, and 
doesn't necessarily have to use function execution to achieve the 
same results. But do at least consider whether new developers will 
want to add new functionality to the server via functions or via your 
this new protocol. If it's harder to use the new protocol than to 
write a new function and invoke it from the client, then I think 
we've done something wrong.



A couple of other comments, now that I've looked a little more:

1) The list of error codes 
 
seems really incomplete. It looks like we've picked a few of the 
possible exceptions geode could throw and assigned them integer ids? 
What is the rational for the exceptions that are included here vs. 
other exceptions? Also, not all messages would need to return these 
error codes.


2) The existing protocol has some functionality even for basic puts 
that is not represented here. Client generate an event id that is 
associated with the put and send that to the server. These event ids 
are used to guarantee that if a client does put (A, 0) followed by 
put (A, 1), the resulting value will always be 1, even if the client 
timed out and retried put (A, 0). The event id prevents the lingered 
put that timed out on the server from affecting the state. I'm not 
saying the new protocol has to support this sort of behavior, but you 
might want to consider whether the current protocol should specify 
anything about how events are retried.


-Dan







Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-03 Thread Hitesh Khamesra
We have version at api(put, get etc) level 
https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-RequestHeader.
The client will connect to gemfire server by sending the "byte". That can be 
treated for message serialization.

  From: Michael Stolz <mst...@pivotal.io>
 To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <ukohlme...@pivotal.io>
 Sent: Wednesday, May 3, 2017 8:55 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I'm not seeing any mention of versioning of the serialization protocol.
Versioning is critical to be able to support change over time. We must
version each layer of serialization. The transport message needs versions,
the payload serialization needs versions.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Tue, May 2, 2017 at 8:11 PM, Jacob Barrett <jbarr...@pivotal.io> wrote:

> I agree completely with Dan. There is no reason to have flags for value
> encoding type in the message. I would argue that should be part of the
> value serialization layer. If something was placed in the message layer it
> should be more generic and allow for an unrestricted set of encodings by
> some ID.
>
> Object {
> variant ID codec;
> byte[] payload;
> }
>
>
> -Jake
>
>
> Sent from my iPhone
>
> > On May 2, 2017, at 1:42 PM, Dan Smith <dsm...@pivotal.io> wrote:
> >
> > I guess the value of building other messages on top of the function
> service
> > mostly comes into play when we start talking about smarter clients that
> can
> > do single hop. At that point it's really nice to have have a layer that
> > lets us send a message to a single primary, or all of the members that
> host
> > a region etc. It is also nice that right now if I add new function that
> > functionality becomes available to gfsh, REST, Java, and C++ developers
> > automatically.
> >
> > I do agree that the new protocol could build in these concepts, and
> doesn't
> > necessarily have to use function execution to achieve the same results.
> But
> > do at least consider whether new developers will want to add new
> > functionality to the server via functions or via your this new protocol.
> If
> > it's harder to use the new protocol than to write a new function and
> invoke
> > it from the client, then I think we've done something wrong.
> >
> >
> > A couple of other comments, now that I've looked a little more:
> >
> > 1) The list of error codes
> > <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-
> ErrorCodeDefinitions>
> > seems really incomplete. It looks like we've picked a few of the possible
> > exceptions geode could throw and assigned them integer ids? What is the
> > rational for the exceptions that are included here vs. other exceptions?
> > Also, not all messages would need to return these error codes.
> >
> > 2) The existing protocol has some functionality even for basic puts that
> is
> > not represented here. Client generate an event id that is associated with
> > the put and send that to the server. These event ids are used to
> guarantee
> > that if a client does put (A, 0) followed by put (A, 1), the resulting
> > value will always be 1, even if the client timed out and retried put (A,
> > 0). The event id prevents the lingered put that timed out on the server
> > from affecting the state. I'm not saying the new protocol has to support
> > this sort of behavior, but you might want to consider whether the current
> > protocol should specify anything about how events are retried.
> >
> > -Dan
>


   

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Jacob Barrett
I agree completely with Dan. There is no reason to have flags for value 
encoding type in the message. I would argue that should be part of the value 
serialization layer. If something was placed in the message layer it should be 
more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith  wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> 
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Udo Kohlmeyer
We are considering the function service, but again, this should not 
detract from the proposed message specification proposal.


You are also correct in your observation of list of error codes not 
being complete nor exhaustive. Maybe the first page needs to highlight 
that this is a proposal and does not contain all the error codes that we 
could per api.


As for the EventId, we will look into this and update the document 
accordingly.


--Udo


On 5/2/17 13:42, Dan Smith wrote:
I guess the value of building other messages on top of the function 
service mostly comes into play when we start talking about smarter 
clients that can do single hop. At that point it's really nice to have 
have a layer that lets us send a message to a single primary, or all 
of the members that host a region etc. It is also nice that right now 
if I add new function that functionality becomes available to gfsh, 
REST, Java, and C++ developers automatically.


I do agree that the new protocol could build in these concepts, and 
doesn't necessarily have to use function execution to achieve the same 
results. But do at least consider whether new developers will want to 
add new functionality to the server via functions or via your this new 
protocol. If it's harder to use the new protocol than to write a new 
function and invoke it from the client, then I think we've done 
something wrong.



A couple of other comments, now that I've looked a little more:

1) The list of error codes 
 
seems really incomplete. It looks like we've picked a few of the 
possible exceptions geode could throw and assigned them integer ids? 
What is the rational for the exceptions that are included here vs. 
other exceptions? Also, not all messages would need to return these 
error codes.


2) The existing protocol has some functionality even for basic puts 
that is not represented here. Client generate an event id that is 
associated with the put and send that to the server. These event ids 
are used to guarantee that if a client does put (A, 0) followed by put 
(A, 1), the resulting value will always be 1, even if the client timed 
out and retried put (A, 0). The event id prevents the lingered put 
that timed out on the server from affecting the state. I'm not saying 
the new protocol has to support this sort of behavior, but you might 
want to consider whether the current protocol should specify anything 
about how events are retried.


-Dan




Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Dan Smith
I guess the value of building other messages on top of the function service
mostly comes into play when we start talking about smarter clients that can
do single hop. At that point it's really nice to have have a layer that
lets us send a message to a single primary, or all of the members that host
a region etc. It is also nice that right now if I add new function that
functionality becomes available to gfsh, REST, Java, and C++ developers
automatically.

I do agree that the new protocol could build in these concepts, and doesn't
necessarily have to use function execution to achieve the same results. But
do at least consider whether new developers will want to add new
functionality to the server via functions or via your this new protocol. If
it's harder to use the new protocol than to write a new function and invoke
it from the client, then I think we've done something wrong.


A couple of other comments, now that I've looked a little more:

1) The list of error codes

seems really incomplete. It looks like we've picked a few of the possible
exceptions geode could throw and assigned them integer ids? What is the
rational for the exceptions that are included here vs. other exceptions?
Also, not all messages would need to return these error codes.

2) The existing protocol has some functionality even for basic puts that is
not represented here. Client generate an event id that is associated with
the put and send that to the server. These event ids are used to guarantee
that if a client does put (A, 0) followed by put (A, 1), the resulting
value will always be 1, even if the client timed out and retried put (A,
0). The event id prevents the lingered put that timed out on the server
from affecting the state. I'm not saying the new protocol has to support
this sort of behavior, but you might want to consider whether the current
protocol should specify anything about how events are retried.

-Dan


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Hitesh Khamesra
Absolutely its a implementation detail.
JSON: Surely we can consider ValueHeader. But then every client(and message) 
needs to send that. Using metadata its a optional.


  From: Dan Smith <dsm...@pivotal.io>
 To: Udo Kohlmeyer <ukohlme...@pivotal.io> 
Cc: dev@geode.apache.org
 Sent: Tuesday, May 2, 2017 11:39 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
> IsPartialMessage: This flag gives us ability to send partial message
> without serializing the whole key-value(request). lets say I execute
> function at server and function just returns "arraylist of object". And the
> serialized size of ""arraylist of object"" is quite big( > 2gb).
>

My point about these fields is that it really seems like this stuff should
be handled by different layers. Ideally you would have a fragmentation
layer that is invisible to people writing specific messages, so that
messages are automatically fragmented if they get to large. Think about how
a TCP socket works - you just write data and it is automatically
fragmented. Or are you expecting each individual message type to have it's
own way to doing fragmentation, but it should set this header down in your
protocol layer? That seems really messy.

JSON: this is a feature we want to introduce, where client can send JSON
> string and we want to save that JSON string into pdx.


Same thing here, JSON support sounds great, but having a header field of
JSON_KEY seems like a hacky way to do that. It seems like that might belong
in your ValueHeader.




On Tue, May 2, 2017 at 10:20 AM, Udo Kohlmeyer <ukohlme...@pivotal.io>
wrote:

> Hey Dan,
>
> Imo, having a standardized, versioned definition for GET, PUT, PUTALL,
> etc. message, that is encoded/decoded in a manner that multiple clients
> (written in many other languages) can encode/decode these messages, is
> paramount.
>
> Having the standardized operational messages(GET,PUT,etc.) transported
> using the function service vs a more direct operation handler, that is
> another discussion and is something that should be investigated.
>
> My immediate concerns regarding "normal" operations over the function
> service are:
>
>    1. I don't believe the current function service is "stream" enabled,
>    and would require some potential rework for subscription-based operations
>    2. Can the function service handle the extra load?
>    3. Is the function service "lean" enough to sustain acceptable
>    throughput? The current client/server protocol averages around
>    40,000-50,000 messages/second.
>    4. There are some messages that are passed between the client <->
>    locator. Given that the function service is "server" specific, this
>    approach would not work for locators, where a different transport mechanism
>    is required. (but this is not a show stopper if function service proves to
>    be viable)
>    5. How much effort would be required to make the "old" function
>    service, handle the new messages, ensuring that the current behavior is
>    preserved.
>
> As per a previous discussion we had, I believe that the "function-like"
> behavior (retry, HA, write vs read optimized) can incorporated into the
> processing layer on the server. In that way all messages can benefit from
> that behavior. In addition to this, if we have a single mechanism that will
> handle messages, retry, HA, read/write optimizations, is preferable to
> having a few "bespoke" implementations. So either approach (new message
> handling) vs function service, will be preferable.
>
> "*The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.*"
>
> It can be argued that having a standard client/server message, with
> standardized encoding/decoding, is the same as using function execution.
> Both require a little syntactic sugar to add new functionality to an
> already standardized message.
>
> --Udo
> On 5/1/17 17:27, Dan Smith wrote:
>
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
>
> The current func

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Dan Smith
> IsPartialMessage: This flag gives us ability to send partial message
> without serializing the whole key-value(request). lets say I execute
> function at server and function just returns "arraylist of object". And the
> serialized size of ""arraylist of object"" is quite big( > 2gb).
>

My point about these fields is that it really seems like this stuff should
be handled by different layers. Ideally you would have a fragmentation
layer that is invisible to people writing specific messages, so that
messages are automatically fragmented if they get to large. Think about how
a TCP socket works - you just write data and it is automatically
fragmented. Or are you expecting each individual message type to have it's
own way to doing fragmentation, but it should set this header down in your
protocol layer? That seems really messy.

JSON: this is a feature we want to introduce, where client can send JSON
> string and we want to save that JSON string into pdx.


Same thing here, JSON support sounds great, but having a header field of
JSON_KEY seems like a hacky way to do that. It seems like that might belong
in your ValueHeader.




On Tue, May 2, 2017 at 10:20 AM, Udo Kohlmeyer 
wrote:

> Hey Dan,
>
> Imo, having a standardized, versioned definition for GET, PUT, PUTALL,
> etc. message, that is encoded/decoded in a manner that multiple clients
> (written in many other languages) can encode/decode these messages, is
> paramount.
>
> Having the standardized operational messages(GET,PUT,etc.) transported
> using the function service vs a more direct operation handler, that is
> another discussion and is something that should be investigated.
>
> My immediate concerns regarding "normal" operations over the function
> service are:
>
>1. I don't believe the current function service is "stream" enabled,
>and would require some potential rework for subscription-based operations
>2. Can the function service handle the extra load?
>3. Is the function service "lean" enough to sustain acceptable
>throughput? The current client/server protocol averages around
>40,000-50,000 messages/second.
>4. There are some messages that are passed between the client <->
>locator. Given that the function service is "server" specific, this
>approach would not work for locators, where a different transport mechanism
>is required. (but this is not a show stopper if function service proves to
>be viable)
>5. How much effort would be required to make the "old" function
>service, handle the new messages, ensuring that the current behavior is
>preserved.
>
> As per a previous discussion we had, I believe that the "function-like"
> behavior (retry, HA, write vs read optimized) can incorporated into the
> processing layer on the server. In that way all messages can benefit from
> that behavior. In addition to this, if we have a single mechanism that will
> handle messages, retry, HA, read/write optimizations, is preferable to
> having a few "bespoke" implementations. So either approach (new message
> handling) vs function service, will be preferable.
>
> "*The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.*"
>
> It can be argued that having a standard client/server message, with
> standardized encoding/decoding, is the same as using function execution.
> Both require a little syntactic sugar to add new functionality to an
> already standardized message.
>
> --Udo
> On 5/1/17 17:27, Dan Smith wrote:
>
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
>
> The current function execution API is a little clunky and needs some work.
> But what it does do is provide the fundamental logic to target operations
> at members that host certain keys and retry in the case of failure.
>
> The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.
>
> -Dan
>
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer 

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Udo Kohlmeyer

Hey Dan,

Imo, having a standardized, versioned definition for GET, PUT, PUTALL, 
etc. message, that is encoded/decoded in a manner that multiple clients 
(written in many other languages) can encode/decode these messages, is 
paramount.


Having the standardized operational messages(GET,PUT,etc.) transported 
using the function service vs a more direct operation handler, that is 
another discussion and is something that should be investigated.


My immediate concerns regarding "normal" operations over the function 
service are:


1. I don't believe the current function service is "stream" enabled,
   and would require some potential rework for subscription-based
   operations
2. Can the function service handle the extra load?
3. Is the function service "lean" enough to sustain acceptable
   throughput? The current client/server protocol averages around
   40,000-50,000 messages/second.
4. There are some messages that are passed between the client <->
   locator. Given that the function service is "server" specific, this
   approach would not work for locators, where a different transport
   mechanism is required. (but this is not a show stopper if function
   service proves to be viable)
5. How much effort would be required to make the "old" function
   service, handle the new messages, ensuring that the current behavior
   is preserved.

As per a previous discussion we had, I believe that the "function-like" 
behavior (retry, HA, write vs read optimized) can incorporated into the 
processing layer on the server. In that way all messages can benefit 
from that behavior. In addition to this, if we have a single mechanism 
that will handle messages, retry, HA, read/write optimizations, is 
preferable to having a few "bespoke" implementations. So either approach 
(new message handling) vs function service, will be preferable.


"/The advantage of this approach is that if someone just builds a driver 
that only supports function execution and whatever serialization 
framework is required to serialize function arguments, they already have 
an API that application developers could use to do pretty much anything 
they wanted to do on the server. Having a Region object with methods 
like get and put on it could just be a little syntatic sugar on top of 
that./"


It can be argued that having a standard client/server message, with 
standardized encoding/decoding, is the same as using function execution. 
Both require a little syntactic sugar to add new functionality to an 
already standardized message.


--Udo

On 5/1/17 17:27, Dan Smith wrote:
I think any new client driver or server we develop might want to 
incorporate function execution at lower level than region operations 
like get and put, etc. We could then easily build operations like GET, 
PUT, PUTALL, etc. on top of that by making them functions. The 
original client protocol isn't designed like that because it pre-dates 
function execution.


The current function execution API is a little clunky and needs some 
work. But what it does do is provide the fundamental logic to target 
operations at members that host certain keys and retry in the case of 
failure.


The advantage of this approach is that if someone just builds a driver 
that only supports function execution and whatever serialization 
framework is required to serialize function arguments, they already 
have an API that application developers could use to do pretty much 
anything they wanted to do on the server. Having a Region object with 
methods like get and put on it could just be a little syntatic sugar 
on top of that.


-Dan

On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer > wrote:


Hi there Geode community,

The new Client-Server protocol proposal is available for review.

It can be viewed and commented on
https://cwiki.apache.org/confluence/display/GEODE/New+Client+Server+Protocol



--Udo






Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-02 Thread Anthony Baker
I think the downside of having a single generic message type is that we lose 
“type safety” and some efficiency.  The message definition would essentially 
become:

String functionName;
byte[][] args;

It’s a little more challenging for an author of a Geode Driver to fill in the 
args correctly compared to calling specific methods on a generated stub.  Also, 
if the argument data types are fixed in the message definition we can apply 
efficient encoding techniques automatically (e.g. varint, zigzag, optional).

I also wonder about the code path efficiency for functions vs get / put.  That 
would be an interesting test.


Anthony

> On May 1, 2017, at 5:27 PM, Dan Smith  wrote:
> 
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
> 
> The current function execution API is a little clunky and needs some work.
> But what it does do is provide the fundamental logic to target operations
> at members that host certain keys and retry in the case of failure.
> 
> The advantage of this approach is that if someone just builds a driver that
> only supports function execution and whatever serialization framework is
> required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.
> 
> -Dan
> 
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer 
> wrote:
> 
>> Hi there Geode community,
>> 
>> The new Client-Server protocol proposal is available for review.
>> 
>> It can be viewed and commented on https://cwiki.apache.org/confl
>> uence/display/GEODE/New+Client+Server+Protocol
>> 
>> --Udo
>> 
>> 



Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-01 Thread Hitesh Khamesra
>>>The message header currently is specified to have things like correlation
id, isPartial message, and also metdatadata about whether the key or the
value is JSON.
IsPartialMessage: This flag gives us ability to send partial message without 
serializing the whole key-value(request). lets say I execute function at server 
and function just returns "arraylist of object". And the serialized size of 
""arraylist of object"" is quite big( > 2gb).
metadata: Most of the metadata will be optional. JSON: this is a feature we 
want to introduce, where client can send JSON string and we want to save that 
JSON string into pdx.


  From: Dan Smith <dsm...@pivotal.io>
 To: Udo Kohlmeyer <ukohlme...@pivotal.io> 
Cc: dev@geode.apache.org
 Sent: Monday, May 1, 2017 5:53 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I think the current proposal seems to be gloming some things together that
probably belong in different layers.

The message header currently is specified to have things like correlation
id, isPartial message, and also metdatadata about whether the key or the
value is JSON. Fragmenting messages (isPartialMessage) seems like it
belongs at a lower level, and depending on what transport your are using is
probably already handled (TCP, HTTP). Potentially similar issue for
request/response (correlationId)  - if you were going over http that would
already be handled. Whether a value is JSON seems like it belongs at a
higher level that specifies the value serialization format. Not to mention
not all messages have keys and values, and some have more than one.

If wonder if it would make more sense to organize the message structure
into layers that each have their own responsibility - a fragmentation layer
(maybe not necessary), a request/response layer (maybe not necessary, not
needed for all message types), function execution layer, individual
operations layer (put, get), value serialization layer.

Without having nailed down what underlying serialization framework you are
using, talking about field byte sizes, etc. seems a bit premature.

-Dan

On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <ukohlme...@pivotal.io>
wrote:

> Hi there Geode community,
>
> The new Client-Server protocol proposal is available for review.
>
> It can be viewed and commented on https://cwiki.apache.org/confl
> uence/display/GEODE/New+Client+Server+Protocol
>
> --Udo
>
>


   

Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-01 Thread Dan Smith
I think the current proposal seems to be gloming some things together that
probably belong in different layers.

The message header currently is specified to have things like correlation
id, isPartial message, and also metdatadata about whether the key or the
value is JSON. Fragmenting messages (isPartialMessage) seems like it
belongs at a lower level, and depending on what transport your are using is
probably already handled (TCP, HTTP). Potentially similar issue for
request/response (correlationId)  - if you were going over http that would
already be handled. Whether a value is JSON seems like it belongs at a
higher level that specifies the value serialization format. Not to mention
not all messages have keys and values, and some have more than one.

If wonder if it would make more sense to organize the message structure
into layers that each have their own responsibility - a fragmentation layer
(maybe not necessary), a request/response layer (maybe not necessary, not
needed for all message types), function execution layer, individual
operations layer (put, get), value serialization layer.

Without having nailed down what underlying serialization framework you are
using, talking about field byte sizes, etc. seems a bit premature.

-Dan

On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer 
wrote:

> Hi there Geode community,
>
> The new Client-Server protocol proposal is available for review.
>
> It can be viewed and commented on https://cwiki.apache.org/confl
> uence/display/GEODE/New+Client+Server+Protocol
>
> --Udo
>
>


Re: [gemfire-dev] New Client-Server Protocol Proposal

2017-05-01 Thread Dan Smith
I think any new client driver or server we develop might want to
incorporate function execution at lower level than region operations like
get and put, etc. We could then easily build operations like GET, PUT,
PUTALL, etc. on top of that by making them functions. The original client
protocol isn't designed like that because it pre-dates function execution.

The current function execution API is a little clunky and needs some work.
But what it does do is provide the fundamental logic to target operations
at members that host certain keys and retry in the case of failure.

The advantage of this approach is that if someone just builds a driver that
only supports function execution and whatever serialization framework is
required to serialize function arguments, they already have an API that
application developers could use to do pretty much anything they wanted to
do on the server. Having a Region object with methods like get and put on
it could just be a little syntatic sugar on top of that.

-Dan

On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer 
wrote:

> Hi there Geode community,
>
> The new Client-Server protocol proposal is available for review.
>
> It can be viewed and commented on https://cwiki.apache.org/confl
> uence/display/GEODE/New+Client+Server+Protocol
>
> --Udo
>
>