Re: [USRP-users] FPGA Repository and Stability

Ashish Chaudhari via USRP-users Tue, 04 Sep 2018 18:08:46 -0700

On Tue, Sep 4, 2018 at 4:51 PM, Brian Padalino <[email protected]> wrote:
> On Tue, Sep 4, 2018 at 6:55 PM Ashish Chaudhari <[email protected]>
> wrote:
>>
>> On Tue, Sep 4, 2018 at 8:07 AM, Brian Padalino via USRP-users
>> <[email protected]> wrote:
>> > Recently there was a significant change to the noc_block_vector_iir
>> > (specifically the vector_iir):
>> >
>> >
>> >
>> > https://github.com/EttusResearch/fpga/commit/05ca30fe91330d54dcd005a3af4aeaa0dc26c8f8#diff-4b21f52231ba9f82bf132f633d594187
>> >
>> > It's a pretty significant re-write of the block, and this behavior has
>> > me
>> > asking a few questions:
>> >
>> > 1) Can anyone at Ettus expand on the reason for the major re-write?  It
>> > appears that the block itself worked previously so it seems like
>> > re-writing
>> > it was a lot of effort to potentially add new bugs to things which
>> > weren't
>> > there previously.
>> >
>>
>> There were several reasons that contributed to the re-writing that
>> block. First off, there was a bug in the delay line implementation
>> that needed to be fixed. Secondly, we wanted to improve the
>> readability and resource utilization of blocks where users hand-write
>> their DSP (as opposed to using third-party AXI-Stream IP). The old
>> approach was to implement a DSP algorithm using basic operations like
>> addition, multiplication, rounding, etc each with AXI-Stream
>> handshaking. This has the following drawbacks: 1) The code is a bit
>> hard to read because the AXI-Stream stuff takes up code real-estate 2)
>> It's hard to let the tools infer DSP primitives automatically because
>> AXI-Stream can get in the way of the standard Xilinx-intended control
>> sets. 3) The AXI-Stream logic takes up more fabric resources. The
>> Vector IIR block is the first block that wraps AXI-Stream around the
>> entire DSP algorithm instead of around basic functions. This is the
>> way we intend to write future blocks with hand-implemented DSP. The
>> change is actually a step forward in terms of stability because the
>> API to the actual DSP kernel is much simpler and intuitive, and is
>> much closer to how FPGA DSP implementation is taught. The simpler API
>> also makes it easy for the Xilinx tools to perform
>> optimization/mapping because AXI-Stream handshaking is not in the way.
>> Don't get me wrong, AXI-Stream is great, we are just changing the
>> granularity at which we implement it.
>
>
> Why start with this block instead of just blocks going forward?  I saw the
> addition of the biquad - great!  But, ideally, you'd fix the bug in the
> delay line and move on if that's what drove people looking into the behavior
> to begin with?
>
> I prefer less AXI streaming interfaces as well, but it just seems like a
> poor idea to rewrite an already written block and change something out from
> underneath everyone.
>
>>
>> > 2) Why wasn't this re-written block added as a second option for a
>> > vectored
>> > IIR implementation?  There are 64-bits worth of RFNoC block ID's out
>> > there.
>> > I understand if you don't want to support specific RFNoC blocks anymore,
>> > and
>> > want to move onto new ones, but can you retire them in a better, more
>> > thought out way?  Moving them to a deprecated folder is fine and giving
>> > one
>> > or two releases to get people to move away is fine, but please stop
>> > gutting
>> > and re-writing something like it hasn't completely changed.  There are
>> > people who may want to stick with the old way of doing things (the old
>> > DDC/DUCs are another recent re-write that could have been handled
>> > better).
>> > Please give us a chance at trying to maintain stability.
>> >
>>
>> Other than fixing the delay line issue, we did not change the behavior
>> of the block. We also did not change the software or FPGA interface to
>> the block. All that changed was the implementation, and that does not
>> warrant a new block or a deprecated copy. You can argue that if
>> something/anything changes in a piece of functional code, that it is
>> inherently unstable. However, that applies to everything from a one
>> liner bugfix to complete rewrite. That said, if you noticed a bug in
>> the code, then we will absolutely need to and will fix it.
>>
>> In general, RFNoC is still evolving and we are slowly cleaning up
>> interfaces to reach a point where we are confident that it is easy to
>> use and it efficiently supports what our customers want to do. We
>> always try to balance improvements with backwards compatibility, but
>> there are cases where we cannot guarantee compatibility. In those
>> cases we will try to minimize the incompatible changes, and increment
>> the compatibility number to ensure that changes don't go unnoticed to
>> software. None of that was true for this particular block change.
>
>
> No offense, but you're wrong in many ways.  From an implementation
> standpoint, your re-write isn't a drop-in replacement.  It requires code
> changes for anyone utilizing the vector_iir module.  On top of that, if bugs
> were found in new implementations, but not in old ones, it's not easy to
> roll back the changes.
>


I beg to differ. You can't talk about API stability without defining
the layer of abstraction. Yes, the changes are not a drop in
replacement for whoever was using vector_iir.v; however, they are for
whoever is using noc_block_vector_iir.v. You mentioned maintaining
binary compatibility in software. That's because we have an API with
exported C++ symbols that we maintain compatibility for. There are a
bunch of modules under the hood that are re-written and replaced on a
monthly basis that you can't even see. The same is true in the FPGA.
We have to draw a line in the sand and say this is our layer of
abstraction. Unfortunately, it's a bit harder to do in the FPGA than
it is in software. However, that concept still has to hold true and
currently noc_shell and axi_wrapper is our supported layer of
abstraction for the framework, and the crossbar interface for
noc_blocks. Granted we have to do a better job of documenting and
maintaining compatibility for those interfaces, and we are working on
improving and publishing specs for them. However, we cannot possibly
make that guarantee for all every last module in the design, and
reserve the right to change those at any point. If you absolutely need
to roll back changes, you can use version control to do so.
Maintaining an old copy of every file we change is a poor way of doing
software maintenance.

> Take the DDC/DUC changes for example.  I understand why they were done.  I
> understand they are more efficient.  If I wanted to use the old verion of
> the DDC/DUC in a design, it is very difficult to bring that back - even to
> just do an A/B comparison in case there is a bug.  The old code worked fine.
> It was not broken or riddled with bugs, but for some reason Ettus decided to
> just ditch it all.
>

We decided to ditch it because we improved the old code and saw no
benefit to maintaining two different versions that are almost
identical in behavior. Code maintenance is expensive and it is
pointless to do if it doesn't provide any benefit. This is not a
general statement though, there might be instances where we have two
implementations with different optimization goals or behavioral
tradeoffs. In that case we will most like have a parameter passed into
a module and select from multiple implementations that we maintain at
the same time. That was just not true for the DDC/DUC changes. Again,
if you need to compare old vs new behavior, you can just go back in
time in git.

> Another example is the noc_shell compatibility jump from 4 to 5 changing the
> RB_FIFOSIZE to be bytes instead of log2(bytes):
>
>
> https://github.com/EttusResearch/fpga/commit/32a0b8985ae9ab636957321c5b51e7d55cf30bb7
>
> I understand the flow control is now byte based in FPGA's master, but if
> it's the case that the buffering doesn't need to be a power of 2, then why
> wasn't STR_SINK_FIFOSIZE changed as well during this modification?

Because the power-of-2 change is not inherently related to byte-based
flow control.

> Moreover, if this is the only real change for the noc_shell, can't the host
> understand version 4 or 5 of the noc_shell and handle the readback pretty
> easily?  Why did this need to be deprecated and thrown out on the host side
> code?

See my comment above regarding maintaining multiple implementations of
the same thing. It does not add value by itself to maintain two
different ways to read a buffer size. For the layer of abstraction
that we have determined i.e. noc_shell (FPGA) and block_ctrl (SW), the
readback register commit does not impact the user API. It's a
transparent framework change in how UHD talks to the FPGA. It just
requires UHD and FPGA to be updated in lock-step.

Lastly, if this FPGA code, being in it's own repository, has a
> corresponding commit or set of commits associated with the uhd repository,
> can you please have one, the other, or both reference the changed commits in
> the other repositories?
>

The UHD repository has a link to the FPGA repository via the fpga-src
submodule. We must update that every time there is a backwards
incompatible change between UHD and FPGA. In addition we also upload
pre-built images to allow users to get the new FPGA bitstreams without
building them. For other changes, we may update the fpga-src pointer
periodically.


> You have an interest in RFNoC from the community, but you're making it very
> difficult to embrace due to changes that are not conveyed very well and
> seemingly random changes that cause problems finding and maintaining
> stability.  It makes code bisection on the host side pretty much impossible
> to do when trying to find newly introduced bugs.
>
> I'll ask once more, though I think it'll fall on deaf ears: please consider
> keeping old code around a little bit longer, and phasing things out to allow
> people to get their footing as you make your design changes with RFNoC.
>

I think the real problem here is documentation, especially for the
user-facing interfaces, and I assure you that we are working on
improving it. As I said above, it may not be clear where that line is
the sand is drawn so we are working on publishing a spec for that
layer of abstraction so it is clear what we can change and what we
can't. Also, your feedback is definitely not falling on deaf ears, we
take it very seriously. We also take code maintainability seriously
and are moving to improving our interfaces to that end. However, we
have to balance bleeding edge and legacy, along with new development
cost and maintenance cost. Your suggestion to keep old copies of every
module is just not scalable, but we are addressing the underlying
problem which is interface stability. Writing down the FPGA interface
should give power-users like you the clarity in terms of the
maintenance (update) cost associated with the modules in our codebase
that you plan to use. If we make a change to this stable interface
then we need to (and will) go through the deprecation process that you
mentioned. It's rolling out over time though. Outside of that, we
expect user to rollback in git to grab older versions of code if they
need to.

> Brian

_______________________________________________
USRP-users mailing list
[email protected]
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Re: [USRP-users] FPGA Repository and Stability

Reply via email to