You make this way too complicated because you are trying to be way too
perfect. If you scan 3 bitmap blocks and find nothing, stop trying to
size match.
Hans
Jeffrey Mahoney wrote:
> Jeff Mahoney wrote:
>
> >Hans Reiser wrote:
>
> >>>Guys, if you run the kernel under a debugger, and get it to where you
> >>>see the excessive CPU usage, and then start stepping through the bitmap
> >>>code, I am sure it will be very obvious what the error is. Can anyone
> >>>do that for us? Jeff?
>
> >Apologies to everyone CC'd who've already seen this message. It was
> >bounced from the namesys servers and I wanted to preserve the CC list.
>
> >***
>
> >Mike sent me a copy of the metadata and I am now able to reproduce
> >locally. My profiling looks like this:
>
> >samples % image name app name symbol name
> >148596 17.8573 reiserfs.ko reiserfs reiserfs_in_journal
> >58194 6.9934 reiserfs.ko reiserfs search_by_key
> >38937 4.6792 vmlinux vmlinux memmove
> >38783 4.6607 reiserfs.ko reiserfs scan_bitmap_block
> >38466 4.6226 jbd jbd (no symbols)
> >23249 2.7939 vmlinux vmlinux __find_get_block
> >18196 2.1867 vmlinux vmlinux tty_write
> >17734 2.1312 vmlinux vmlinux do_ioctl
> >17293 2.0782 loop loop (no symbols)
> >15400 1.8507 vmlinux vmlinux cond_resched_lock
> >14836 1.7829 vmlinux vmlinux copy_user_generic_c
> >14143 1.6996 reiserfs.ko reiserfs do_journal_end
> >13638 1.6389 vmlinux vmlinux find_next_zero_bit
> >13236 1.5906 vmlinux vmlinux default_llseek
> >12925 1.5532 vmlinux vmlinux bit_waitqueue
> >8921 1.0721 vmlinux vmlinux __delay
>
>
> >Hans -
>
> >My speculation about the bitmaps being fragmented was right on. I wrote
> >a quick little script to parse the output of debugreiserfs -m and report
> >on the frequency of different window sizes. Windows of 1-31 blocks are
> >extremely common, accounting for 99.8% of all free windows. The problem
> >is that in my testing, where I made the allocator report the size of
> >allocation requests, the most common request was for a window of 32
> blocks.
>
> >What's happening is that we keep finding windows that are too small,
> >which results in a lot of wasted effort. The cycle goes like this:
>
> >if (unfm && is_block_in_journal(s, bmap_n, *beg, beg))
> > continue;
> >/* first zero bit found; we check next bits */
> >for (end = *beg + 1;; end++) {
> > if (end >= *beg + max || end >= boundary
> > || reiserfs_test_le_bit(end, bi->bh->b_data)) {
> > next = end;
> > break;
> > }
> > /* finding the other end of zero bit window requires
> > * looking into journal structures (in
> > * case of searching for free blocks for unformatted nodes) */
> > if (unfm && is_block_in_journal(s, bmap_n, end, &next))
> > break;
> >}
>
> >If the window is too small, we end up looping up to the top and try to
> >find another one. Since the overwhelming majority of the windows are too
> >small, we go through just about all the bitmaps without backing off the
> >window size.
>
>
> To be clear, eventually the allocations are honored, but only after
> *all* of the bitmaps are searched. On the third pass, we drop the window
> to a single block and restart the scan, eventually building a 32-block
> set that is probably quite fragmented. This occurs on every write, hence
> the huge performance hit.
>
> It appears as though ext3 doesn't have this problem because they don't
> batch writes the way reiserfs does. They'll start a search at a decent
> hint the same way we do, but the window is always one block.
>
> So, we're stuck between a rock and a hard place. We can have the better
> allocation performance at lower usage and sacrifice performance later or
> we can have stable allocation performance at an overall reduction in
> performance.
>
> I have an idea that may get around both problems, but I'm not sure how
> well it would be received. We currently do some very basic caching of
> bitmap metadata such as the first zero bit and how many free blocks
> there are. What if we constructed an extent map of the free windows in
> each bitmaps when we cache the metadata and adjust the map when we
>
> There's a third option, but I'm not sure how well it would be received
>
> Right now, the allocator keeps track of things like how full a bitmap is
> and where the first zero bit is. It would also be possible to cache a
> list of windows in each bitmap to accelerate performance. This would
> have to be a shrinkable cache, since the pathlogical case could mean
> occupying
>
> -Jeff
>
> --
> Jeff Mahoney
> SUSE Labs