Further tests let me assume that you can do it on a log2 scale but that
appropriate window sizes are crucial.
But how to derive these optmal window sizes I am not sure.
I could calculate the bandwitdh of the octave band (or an octave/N band)
in ERB
for instance but then what? How do I derive a

I think I figured it out.
I use 2^octave * SR/FFTsize -> toERBscale -> * log2(FFTsize)/42 as a
scaling factor for the windows.
Means the window of the top octave is about 367 samples at 44100 SR -
does that seem right?
Sounds better but not so different, still pretty blurry and somewhat

On 7/11/2018 12:03 AM, gm wrote:
A similar idea would be to do some basic wavelet transfrom in octaves
for instance and then
do smaller FFTs on the bands to stretch and shift them but I have no idea
if you can do that - if you shift them you exceed their bandlimit I assume?
and if you stretch

At the moment I am using decreasing window sizes on a log 2 scale.
It's still pretty blurred, and I don't know if I just don't have the
right window parameters,
and if a log 2 scale is too coarse and differs too much from an auditory
scale, or if if I don't have
enough overlaps in resynthesis