Re: codec strategies

2014-05-15 Thread Alan Conway
On Tue, 2014-05-06 at 22:36 -0400, Rafael Schloming wrote:
 I've put together some mock ups of a few different codec strategies
 both to compare from an API/usability perspective and to get a rough
 idea of some of the performance implications of the different choices.
 Please see the attached code for the full details. I'll summarize the
 different strategies below.
 
 The SimpleEncoder is pretty straighforward, the only real point here
 is to use basic types to represent values and thereby minimize the
 amount of intermediate memory and CPU required in order to use the
 codec.
 
 
 The DispatchingDecoder works similarly to a sax style parser. It
 basically iterates over the encoded content and dispatches values to a
 handler.
 
 
 The StreamingDecoder is similar to the DispatchingDecoder except
 instead of an internal bytecode loop calling out to a handler, it is
 externally driven by calling into the decoder. This appears to be
 marginally slower than the DispatchingDecoder in the particular
 scenario in the mock up, however it may have some API benefitis, e.g.
 conversions can be done on demand and it is possible to skip over
 uninteresting data rather than parsing it.
 
 
 
 The mock up also includes the same data being encoded/decode using the
 existing codec (with Clebert's patch).
 
 
 Fair warning, the data I chose to encode/decode is completely
 arbitrary and not intended to be representative at all. That said, the
 numbers I'm getting suggest to me that we can do a whole lot better
 than the current codec if we start with something simple and keep it
 that way. Here is the output I'm getting for a run with a hundred
 million iterations:
 
   simple encode: 4416 millis
   dispatching decode: 3049 millis
   streaming decode: 3243 millis
   existing encode: 9515 millis
   existing decode: 13931 millis
 
 

I vote for DispatchingDecode: it's the simplest, the fastest and is
based on a well established parsing pattern with a good track record for
performance (SAX). Its not so hard to ignore data in a handler.

Writing a handler state machine is a bit more complex than writing a
sequence of calls to a stream API, but I think you could encapsulate
most of a standard state machine that given a sequence of type codes
will fill a sequence of variables. Not sure about the right way to do
that in Java performance-wise.

Hmm. That might be worth another performance test though - if you did
have such tools for making it easy to build handlers, would those tools
introduce a penalty that would make the StreamingDecode look more
attractive...

 Another factor to consider is the difficulty in quantifying the impact
 of generating lots of garbage. In a small benchmark like this there
 isn't a lot of memory pressure, so extra garbage doesn't have a lot of
 impact, however in a real application that would translate into
 increased GC cycles and so might be more of a factor. What I can say
 from watching memory usage under the profiler is that at any given
 point there are typically hundreds of megs worth of garbage Integer
 and UUID instances lying around when the existing codec is running.
 All of the alternative strategies I've included don't generate any
 garbage.
 
 
 --Rafael
 
 




codec strategies

2014-05-15 Thread Rafael Schloming
I've put together some mock ups of a few different codec strategies both to
compare from an API/usability perspective and to get a rough idea of some
of the performance implications of the different choices. Please see the
attached code for the full details. I'll summarize the different strategies
below.

The SimpleEncoder is pretty straighforward, the only real point here is to
use basic types to represent values and thereby minimize the amount of
intermediate memory and CPU required in order to use the codec.

The DispatchingDecoder works similarly to a sax style parser. It basically
iterates over the encoded content and dispatches values to a handler.

The StreamingDecoder is similar to the DispatchingDecoder except instead of
an internal bytecode loop calling out to a handler, it is externally
driven by calling into the decoder. This appears to be marginally slower
than the DispatchingDecoder in the particular scenario in the mock up,
however it may have some API benefitis, e.g. conversions can be done on
demand and it is possible to skip over uninteresting data rather than
parsing it.

The mock up also includes the same data being encoded/decode using the
existing codec (with Clebert's patch).

Fair warning, the data I chose to encode/decode is completely arbitrary and
not intended to be representative at all. That said, the numbers I'm
getting suggest to me that we can do a whole lot better than the current
codec if we start with something simple and keep it that way. Here is the
output I'm getting for a run with a hundred million iterations:

  simple encode: 4416 millis
  dispatching decode: 3049 millis
  streaming decode: 3243 millis
  existing encode: 9515 millis
  existing decode: 13931 millis

Another factor to consider is the difficulty in quantifying the impact of
generating lots of garbage. In a small benchmark like this there isn't a
lot of memory pressure, so extra garbage doesn't have a lot of impact,
however in a real application that would translate into increased GC cycles
and so might be more of a factor. What I can say from watching memory usage
under the profiler is that at any given point there are typically hundreds
of megs worth of garbage Integer and UUID instances lying around when the
existing codec is running. All of the alternative strategies I've included
don't generate any garbage.

--Rafael
/*
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 */
package org;

import java.util.UUID;
import org.apache.qpid.proton.codec.*;
import java.nio.*;

/**
 * Codec
 *
 */

public class Codec
{

public static final void main(String[] args) {
int loop = 10*1024*1024;
if (args.length  0) {
loop = Integer.parseInt(args[0]);
}

String test = all;
if (args.length  1) {
test = args[1];
}

boolean runDispatching =
test.equals(all) || test.equals(dispatching);
boolean runStreaming = test.equals(all) || test.equals(streaming);
boolean runExisting =
test.equals(all) || test.equals(existing);

byte[] bytes = new byte[1024];
ByteBuffer buf = ByteBuffer.wrap(bytes);

long start, end;

if (runDispatching || runStreaming) {
start = System.currentTimeMillis();
int size = simpleEncode(bytes, loop);
end = System.currentTimeMillis();
time(simple encode, start, end);

if (runDispatching) {
start = System.currentTimeMillis();
dispatchingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time(dispatching decode, start, end);
}

if (runStreaming) {
start = System.currentTimeMillis();
streamingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time(streaming decode, start, end);
}
}

if (runExisting) {
start = System.currentTimeMillis();
DecoderImpl dec = existingEncode(buf, loop);
end = System.currentTimeMillis();
time(existing encode, start, end);

buf.flip();

start

Re: codec strategies

2014-05-13 Thread Rafael Schloming
On Thu, May 8, 2014 at 9:42 AM, Alan Conway acon...@redhat.com wrote:

 I vote for DispatchingDecode: it's the simplest, the fastest and is
 based on a well established parsing pattern with a good track record for
 performance (SAX). Its not so hard to ignore data in a handler.

 Writing a handler state machine is a bit more complex than writing a
 sequence of calls to a stream API, but I think you could encapsulate
 most of a standard state machine that given a sequence of type codes
 will fill a sequence of variables. Not sure about the right way to do
 that in Java performance-wise.

 Hmm. That might be worth another performance test though - if you did
 have such tools for making it easy to build handlers, would those tools
 introduce a penalty that would make the StreamingDecode look more
 attractive...


The biggest difference from an API perspective has to do with data
conversions/coersion. Say you're writing a piece of Java code that wants to
operate on Java integer or a Java float and doesn't care what the
underlying wire type is so long as it can be reasonable converted to that
type. In a stream style API you would simple write:

  int i = decoder.getInt();
  float f = decoder.getFloat();

The decoder implementation itself can the be smart enough to convert
whatever underlying wire type there might be into the appropriate java
type. The SAX API on the other hand will have a distinct callback for byte
vs ubyte vs short vs ushort, etc, and it could be quite cumbersome to
convert all the different possibilities into the type you actually want to
operate on. Put another way, the stream style API is capable of
incorporating the desired output type of the user, whereas the SAX style
API is not.

It might be possible to provide some kind of coercing handler that would
help the SAX situation. As you say though it's probably worth checking that
something like that would be usable and not introduce its own penalties.

--Rafael