Re: about binary protocol porting

Geoffrey Broadwell Mon, 03 Jan 2022 21:21:08 -0800

Just from skimming some of the relevant docs (not having written adriver for Apache Ignite before), some thoughts:


 * It does indeed look like there is enough info, both as documentation
   and example code, to write codecs and drivers for Ignite
 * The formats and protocols look rather baroque, with significant
   historical baggage -- it's going to take quite a bit of work to get
   a fully compliant driver, though it does look like a smaller subset
   could be built to match just a particular need
 * There is a strong Java flavor to everything; there is some impedance
   mismatch with Raku (such as the Char array
   
<https://ignite.apache.org/docs/latest/binary-client-protocol/data-format#char-array>
   type, which is an array of UTF-16 code units that doesn't
   necessarily contain valid decodeable text)
 * There seems to be a contention in the design between desire to
   support a schema-less/plain-data mode and a schema/object mode; Raku
   easily has the metaobject protocol chops to make the latter possible
   without invoking truly deep magic, but it does require somewhat more
   advanced knowledge to write

So in short: It looks doable, but quite a fair chunk of work dependingon how complete you need it to be, and some decisions need to be madeabout how pedantically to support their Java-flavored APIs.



On 1/3/22 7:39 PM, Piper H wrote:

Glad to hear these suggestions, @Geoffery.

I also have a question, this product has a clear binary protocol, doyou know how to port it to perl or perl6?

https://ignite.apache.org/docs/latest/binary-client-protocol/binary-client-protocol
I was using their python/ruby clients, but there is not a perl version.

Thanks.
Piper

On Tue, Jan 4, 2022 at 11:15 AM Geoffrey Broadwell <g...@sonic.net<mailto:g...@sonic.net>> wrote:


    I love doing binary codecs for Raku[1]!  How you approach this
    really depends on what formats and protocols you want to create
    Raku modules for.

    The first thing you need to be able to do is test if your codec is
    correct.  It is notoriously easy to make a tiny mistake in a
    protocol implementation and (especially for binary protocols) miss
    it entirely because it only happens in certain edge cases.

    If the format or protocol in question is open and has one or more
    public test suites, you're in good shape.  Raku gives a lot of
    power for refactoring tests to be very clean, and I've had good
    success doing this with several formats.

    If there is no public test suite, but you can find RFCs or other
    detailed specs, you can often bootstrap a bespoke test suite from
    the examples in the spec documents. Failing that, sometimes you
    can find sites (even Wikipedia, for the most common formats) that
    have known-correct examples to start with, or have published
    reverse engineering of files or captured data.

    If the format is truly proprietary, you'll be getting lots of
    reverse engineering practice of your own. 😉

    Now that you have some way of testing correctness, you'll want to
    be able to diagnose the incorrect bits.  Make sure you have some
    way of presenting easily-readable text expansions of the binary
    format, because just comparing raw buffer contents can be rather
    tedious (though I admit to having found bugs in a public test
    suite by spending so much time staring at the buffers I could tell
    they'd messed up a translation in a way that made the test always
    pass).  If the format or protocol has an official text
    translation/diagnostic/debug format -- CBOR, BSON, Protobuf, etc.
    all have these -- so much the better, you should support that
    format as soon as practical.

    Once you get down to the nitty-gritty of writing the codec, I find
    it is very important to make it work before making it fast.  There
    is a lot of room for tuning Raku code, but it is WAY easier to get
    things going in the right direction by starting off with idiomatic
    Raku -- given/when, treating the data buffer as if it was a normal
    Array (Positional really), and so on.

    Make sure that with every protocol feature that you add, that you
    make tests newly pass, and (I find at least) that you write the
    coding and decoding bits at the same time, so you can check that
    you can round-trip data successfully.  For the love of all that is
    good, don't implement any obtuse features before the core features
    are rock solid and pass the test suite with nary a hiccup.

    After that, when you think you're ready to optimize, write
    performance /tests/ first.  Make sure you test with data that will
    both use your codec in a typical manner, and also test out all the
    odd corners.  You're looking for things that seem weirdly slow;
    this usually indicates a thinko like copying the entire buffer
    each time you read a byte from it, or somesuch.

    Once you've got the obvious performance kinks worked out, come by
    and ask again, and we can give you further advice from there.  Or
    heck, just come visit us on IRC (#raku at Libera.chat), and we'll
    be happy to help.  (Do stick around for a while though, because
    traffic varies strongly by time of day and day of week.)

    Best Regards,


    Geoff (japhb)


    [1]  I'm a bit of a nut for it, really.  In the distant past, I
    wrapped C libraries to get the job done, but more recently I've
    done them as plain Raku code (and sometimes NQP, the language that
    Rakudo is written in).

    I've written some of the binary format codecs for Raku:

      * https://github.com/japhb/CBOR-Simple
        <https://github.com/japhb/CBOR-Simple>
      * https://github.com/japhb/BSON-Simple
        <https://github.com/japhb/BSON-Simple>
      * https://github.com/japhb/Terminal-ANSIParser
        <https://github.com/japhb/Terminal-ANSIParser>
      * https://github.com/japhb/TinyFloats
        <https://github.com/japhb/TinyFloats>

    Modified or tuned others:

      * https://github.com/samuraisam/p6-pb/commits?author=japhb
        <https://github.com/samuraisam/p6-pb/commits?author=japhb>
      * https://github.com/japhb/serializer-perf
        <https://github.com/japhb/serializer-perf>
      * (Lots of stuff spread across various Cro
        <https://github.com/croservices> repositories)

    Added a spec extension for an existing standardized format (CBOR):

      * https://github.com/japhb/cbor-specs/blob/main/capture.md
        <https://github.com/japhb/cbor-specs/blob/main/capture.md>

    And I think I forgot a few things.  😁

Re: about binary protocol porting

Reply via email to