I'm adding the ovs-dev list since this discussion may be useful to the general development community.

See my responses inline.

On 9/20/19 8:04 PM, Leonid Ryzhyk wrote:
Hi Mark,

Firstly, many thanks for giving DDlog a try!  I am extremely impressed that you managed to get this far without any help from us.  Hopefully, we will be able to help make life easier for you going forward.

Happy to hear that you see the pros of using DDlog in OVN. Your summary of where it helps to improve things is exactly what we intended it for. Let's see if we can help with some of the problems!

* The need for Rust code:
   It's certainly true that DDlog adds yet another language (Rust) in the mix. One of the things on our roadmap is to expand DDlog's libraries with a more complete set of string manipulation routines, so that many of the functions that currently must be implemented in Rust can be coded directly in DDlog. Having said that, it's interesting that you had to implement your string manipulation in Rust.  In my experience with OVN, most string processing logic is already implemented in C, so in most cases all I had to do was to write a Rust wrapper around an existing C function.


Unfortunately, this wasn't possible without exporting some new C functions. In this particular case, the string parsing is taking care of the "addresses" field of logical switch ports. The field can look like any of the following:

"<MAC>"
"<MAC> <IPv4 address>"
"<MAC> <IPv6 address>"
"dynamic"
"<MAC> dynamic"
"dynamic <IPv4 address>"
"dynamic <IPv6 address>"

The first five of these can be done using an exported C function (extract_addresses(), I believe). The final two are handled internally within ovn-northd.c and not exported. Therefore, in the ddlog version, these two are handled in northd/ovn.rs. The sixth version is handled with scan_static_dynamic_ip(). The final version was not handled at all, so I added scan_static_dynamic_ip6() to attempt to cover it.


* Failure resilience:
   I think your summary is correct.  I don't really understand `ovn-northd-ddlog.c`, but Justin did mention that he plans to cleanup and harden it. DDlog itself should be able to handle intermittent failures, e.g., if we lose connection to Southbound DB and then restore it, DDlog can just pick up the new database state and generate the set of changes needed to bring it up to speed with the NB database.

Yes, my impressions were that this wasn't really a fault of DDLog as much as the program driving it. I'm guessing that the reason for not using the OVSDB IDL is that it doesn't have the hooks necessary for DDLog. Rather than modify the OVSDB IDL, ovn-northd-ddlog.c essentially copies the logic of the IDL and inserts code where necessary for DDLog usage.


* Debuggability:
  I agree that debugging is a problem. We keep building new debugging tools, and it sounds like you've already tried some of them, but I can totally see that these may not always be enough.  I think the best strategy is to go case-by-case and figure out how we can make it easier to troubleshoot every particular type of error in the future. Can you point me to your code that fails?

Sure: https://github.com/putnopvut/ovn/commit/0ea3b23d1b6a11f88a870acc4cf15b324c3ffb35

I imagine you'll take one look and immediately realize what's wrong :)

From my POV, what would have made this easiest to figure out was the ability to open a debugger and insert break points into the DDlog to determine the current values at various steps. I think the closest I'd be able to do right now would be to step through the generated Rust code instead. It's possible that I could figure out what I'm doing wrong that way, but it would be so much better if it were possible to debug at the DDLog level.


* Compile times
  Yep, those are driving me nuts, and sadly I don't think we will have a radical solution in the near future. Have you tried `--enable-ddlog-fast-build`? In my experience, this speeds up compilation by a factor of 2, at the cost of the compiled code being twice slower, which is usually not a problem during development. Also, make sure you do _not_ use `--enable-ddlog-northd-cli` unless you need to do replay debugging with CLI. I think, with those two improvements I get compilation times in the order of 5 minutes on my laptop.

I have not enabled the fast build option, nor have I enabled ddlog-northd-cli. I'll give the fast build option a try. Is there a reason it is not enabled by default?


Leonid
------------------------------------------------------------------------
*From:* Mark Michelson <[email protected]>
*Sent:* Friday, September 20, 2019 2:31 PM
*To:* Leonid Ryzhyk <[email protected]>; Numan Siddique <[email protected]>; Lorenzo Bianconi <[email protected]>; Dumitru Ceara <[email protected]>
*Subject:* DDLog after one week
Hi Leonid,

After learning, reading, and using DDLog for a week, I figure I should
give my thoughts.

On the positive side, I think that once we get used to writing DDLog, it
will be easier to maintain and add to than the current incremental
engine we have in the C code. The language requires different thinking
than we're used to, but that's not a bad thing.

Regarding language features, I like the memory safety and strong type
system. It reminds me of using a functional language (e.g. Haskell) in
that regard. I also like the fact that its rules engine makes for easy
incremental processing of the code. I only have to think about how a
relation translates to another relation. I don't need to think about the
deltas to the existing items or anything like that.

On the not-exactly-positive-but-also-not-exactly-negative side, I was
surprised about the amount of Rust I might have to know and use when
using DDLog. For instance, the issue I chose to take on required more
Rust coding than DDLog coding. That's because my change required
changing some string parsing code, and that's offloaded to Rust.

I have three main negatives to share:

1) The current implementation is not resilient in the face of failures.
For example, if ovsdb transactions don't go properly, there is a chance
that the code can enter a busy loop and no longer function. I get the
feeling that ovn-northd-ddlog.c was written with the mindset of getting
something working quickly. It doesn't use the ovsdb IDL, and as a
result, much of the state machine is re-copied locally. I don't think
failure scenarios are properly handled.

2) Debuggability. While I was making my changes, I found that what I had
written had resulted in a database commit failing. However, even when
replaying DDLog, dumping a relevant DDLog relation, and adding printfs,
I still have no idea why my code is wrong. I'm kind of just stuck at
this point. Even more of a concern is debugging live deployments.

3) Compile times. While working on my improvement, I found that making
changes and recompiling resulted in 10+ minute builds every time. It
didn't matter how small or large my changes were. When trying to debug
why a test is failing, and incrementally adding more changes to the
code, this is excruciating.

I've included Numan, Dumitru, and Lorenzo on this message so they can
chime in with their thoughts as well. Note that for them it's already
night time on Friday, so they may not add their thoughts until after the
weekend :)

Thanks,
Mark Michelson

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to