Hi,
[EMAIL PROTECTED] (2005-03-02 at 2158.27 +0100):
My assumption here is that if the adaptive
supersampling code takes magnitudes longer to render
than without supersampling it could be benefitial to
simply use the common code to the render depthxdepth
times the amount of tiles to fill and simply do some
weighting on this data to fill the final tile. Very
easy, reuses existing code, runs multithreaded and is
likely quite a bit faster than the stuff now is.
With your idea, calculating the full 3000*3000 with a depth of 3 is
like calculating 9000*9000 (81 million pixels in RGB, 243*10^6 bytes
plus overhead) and in time it should be 9 times the 3000*3000 non
adaptive version plus the scale operation. To avoid absurd memory
usage, the code will have to be more complex than just render big and
then scale down. It could sample multiple planes and average (9
stacked tiles, each with a small offset for the gradient sampling).
Current adaptive is not paralel but the algorithm, at the logic level,
is paralelizable in tiles, or groups of tiles to not waste so much in
edges. The other gradient methods were non paralel in older versions
anyway.
So I did some rough tests, 2000*2000 with adaptive vs 6000*6000
without adaptive (9000 was too much for my computer, so tried 2 and 6,
same 1:3 ratio and still big). Small with adaptive was 10.3 sec and
big without adaptive was 9.6 sec for linear black to white from one
corner to another or side to side.
Using the rainbow gradient of my other post, 60 pixels sawtooth (or
180 pixel for the big version) side to side or up down, times were
27.9 (*) and 32.1 sec. With distance 400 (or equivalent 1200) they
were 13.2 and 32.2 sec. With 1200 (or 3600) I got 13.0 and 32.1 sec.
[* Interesting detail: the progress bar changed speeds constantly when
doing the gradient vertically but not horizontally. My guess is that I
was seeing how the system computed the changing parts with more
samples and went faster with flat parts.]
Same gradient again, but doing with pointer a diagonal reported in
status bar as 21 21 (or 63 63) times were 65.2 and 32.2 sec. For
diagonal 10 10 (or 30 30) times were 85.2s and 32.1 sec. Diagonal 5 5
(or 15 15) 112.7 and 32.2 sec (this is the case where the rainbow ends
as muddy lines).
Then you have to add scaling time to the big version, around 4 sec
with linear or cubic and 9 sec with lanczos in my computer (I did not
put a timer around that call, sorry, used a wrist chrono). So render
big and scale down seems to be a fixed time of 36 or 41 sec.
Your idea does not seem to be always faster, not approaching the 10x
magical order of magnitude in many cases but 3x in extreme ones and
a big memory hog if done naively. Only cases in which it is faster are
when adaptive has to calculate all the samples, due the test overhead
being a complete waste.
===
Update after letting the mail rest for some hours:
I decided to read again the oversampling code, and try to understand
the real meaning of max level, to see if what my long memory was
saying about being a power factor not a multiplier, like in POVRay,
was true (Quartic notes it was his inspiration). I needed a fresh
recheck of the algorithm.
I see sub_pixel_size = 1 max_depth; which means level 3 can do 9
* 9 = 81 samples for each pixel, of which the bottom row and right
column are shared with neighbor pixels. Your idea, to match this, is
not 3 * 3 but more, in the order of 8 * 8 = 64 (remember this adaptive
code reuses results, so lets stay below 9*9 to be fair). Max level is
the level of recursion, not the number of subsamples per axis.
http://www.povray.org/documentation/view/3.6.1/223/ has a graph about
how this adaptive recursive sampling is performed (the +AM2 switch
method).
I am not going to explain quantization errors, antialiasing or
anything again in this thread, it seems to only waste my time to
demostrate again and again things people have been doing for some
years in other programs. Adaptive or recursive are not something
people just put there to have buzzwords, but as time savers for when
only a small (or even medium, that is: most of the times) set of
pixels really require oversampling. In Gimp, and in POVRay, it is nice
to be able to disable it when you can go with the normal sampling
(smooth gradients, test renders, etc), it is faster without doubts as
it avoid checks. When you want oversampling, adaptive one is faster in
many cases than full sampling, otherwise it would have been silly to
design and code it in first instance.
Now, to finish it and confirm the absurdity of full sampling, lets do
a quick test with 16000 * 16000... better no, dialog asked for 1.91
GB. So 8000*8000 and multiply by four, it seems to be linear when not
using adaptive anyway. 55.1 sec for gradients and 6 or 10 for
scaling. By four that is over 240 sec.
Slow test would be repeating all the tests of 2000*2000 with adaptive
using max level of 1 or 2 (which is when the quality is going to be
similar, 1