I'm working on a proper, robust and fast polygon filler*, as well as always looking to improve my line drawer, and have recently discovered Bresenham's alternative run-slice algorithm, via a Michael Abrash article published in Dr Dobbs in November '92. It switches you from making a decision every pixel (eg, stepping along x, deciding whether to make an adjustment in y after every pixel) to make a decision at the end of every slice (so if you were drawing a line 256 pixels wide and 2 pixels high, the first 128 pixels would be one slice that you knew the length of before drawing, the next 128 would be another slice; the length of every slice in a line varies by no more than 1 pixel so that's the decision) at the cost of an 8bit by 8bit integer divide with remainder outside the loop.
This is explicitly still error tracking and proper integer arithmetic ala per-pixel Bresenham, there are no fractional parts. This isn't DDA in fixed point (which would be a much more than 8bit by 8bit integer divide) or anything like that. http://www.gamedev.net/reference/articles/article386.asp seems to be a facsimile of the article, albeit with the figures and listings missing somewhere. The way I see it, dividing costs less per bit than Bresenham does per pixel (using, e.g. those restoring divide as demonstrated at http://baze.au.com/misc/z80bits.html), so this could start costing less even on very small lines. It'll almost definitely cost less for polygon filling since you mainly want to know x intercept per scanline, so I can switch between regular Bresenham and run slice depending on the edge slope. With a 1/2 offset on the initial run of the slicing algorithm (so I'm picking the centre pixel for every slice), my C tests imply a great result with all the features you'd like (adjacent polygons always exactly meet up, the switch from per-pixel to run-slice isn't visible). That all said, I know I'm often hopelessly optimistic about these things. Has anyone with more experience looked into and/or implemented run-slice line drawing? * the one I used for my youtube videos leaves the occasional empty pixel between adjacent polygons, is limited to triangles and uses costly fixed-point division for every single edge. So: lots and lots of unnecessary 16bit divides.