>> + int bytes_per_row = ((width + 31) / 32) * 4;
>
> I think there's some QEMU_ALIGN macro for that maybe in qemu/osdep.h?
>
>> +
>> + for (i = 0; i < height; i++) {
>> + for (j = 0; j < width; j++) {
>> + byte = mono_src[i * bytes_per_row + (j / 8)];
>> + bit = lsb_to_msb ? 7 - (j % 8) : j % 8;
>> + color = (byte >> bit) & 0x1 ? fg_color : bg_color;
>> + pixel = &color_dst[(i * width + j) * 4];
>> + memcpy(pixel, &color, sizeof(color));
>
> Since it's just writing a 32 bit value maybe cast and = would be faster
> than calling memcpy for this.
>
Yep, good call! I'll make these changes in v3 also.