Quoting Jaroslav Kysela <[EMAIL PROTECTED]>:
> On Thu, 20 Feb 2003, Abramo Bagnara wrote:
>
> > Now I'm able to get the same results you see.
> >
> > However I think that we need to extract some results from this data.
> >
> > I'll leave alone MMX optimizations because I want to compare apples
Jaroslav Kysela wrote:
>
> On Thu, 20 Feb 2003, Abramo Bagnara wrote:
>
> > Now I'm able to get the same results you see.
> >
> > However I think that we need to extract some results from this data.
> >
> > I'll leave alone MMX optimizations because I want to compare apples with
> > apples.
> >
>
Paul Davis wrote:
>
> >The server based approach has an added cost of an extra context switch
> >every period (about 1500 cycles on my machine i.e.), but this is fully
> >amortized by such an huge difference.
>
> recall that (1) the context switch time is not a fixed cost but
Mine was only a ver
On Thu, 20 Feb 2003, tomasz motylewski wrote:
> Do I understand it correctly that the server stores data in 32 bit buffer and
> then puts it in 16 bit DMA buffer of the card? This is one operation more
> compared with mixing directly in DMA buffer.
There is no server and 32-bit buffer is used for
Jaroslav:
> I think that we can lose more in the client/server model. Also, note that
client/server will have higher latency. The server has to copy the samples
"last minute" to DMA buffer and the client has to manage before the server
copies the data. In the direct model only the client's timing
On Thu, 20 Feb 2003, Abramo Bagnara wrote:
> Now I'm able to get the same results you see.
>
> However I think that we need to extract some results from this data.
>
> I'll leave alone MMX optimizations because I want to compare apples with
> apples.
>
> The distributed saturation (also when it
>The server based approach has an added cost of an extra context switch
>every period (about 1500 cycles on my machine i.e.), but this is fully
>amortized by such an huge difference.
recall that (1) the context switch time is not a fixed cost but
depends on the memory behaviour between switches an
Jaroslav Kysela wrote:
>
> On Thu, 20 Feb 2003, Abramo Bagnara wrote:
>
> > Jaroslav Kysela wrote:
> > >
> > > On Wed, 19 Feb 2003, Abramo Bagnara wrote:
> > >
> > > > The results are amazing and I'd say Jaroslav has done some mistakes in
> > > > his handmade asm.
> > >
> > > I don't think so. It
On Thu, 20 Feb 2003, Abramo Bagnara wrote:
> Jaroslav Kysela wrote:
> >
> > On Wed, 19 Feb 2003, Abramo Bagnara wrote:
> >
> > > The results are amazing and I'd say Jaroslav has done some mistakes in
> > > his handmade asm.
> >
> > I don't think so. It seems that my brain still remembers assemb
Jaroslav Kysela wrote:
>
> On Wed, 19 Feb 2003, Abramo Bagnara wrote:
>
> > The results are amazing and I'd say Jaroslav has done some mistakes in
> > his handmade asm.
>
> I don't think so. It seems that my brain still remembers assembler ;-)
> You passed wrong values to my code so it did unali
Jaroslaw Sobierski wrote:
>
>
> s16 s=sample;
> if (unlikely(sample != (s32)s))
>
I've verified exactly this yesterday evening, but it's less efficient
than ordinary boundary check.
--
Abramo Bagnara mailto:[EMAIL PROTECTED]
Opera Unica
Quoting Jaroslav Kysela <[EMAIL PROTECTED]>:
> I don't think so. It seems that my brain still remembers assembler ;-)
...
> sample = *sum;
> s16 s;
> - if (unlikely(sample & 0x))
> + if (unlikely(sample & 0
Jaroslav Kysela wrote:
>
> On Wed, 19 Feb 2003, Abramo Bagnara wrote:
>
> > The results are amazing and I'd say Jaroslav has done some mistakes in
> > his handmade asm.
>
> I don't think so. It seems that my brain still remembers assembler ;-)
I've no doubts about that ;-)
> You passed wrong v
On Wed, 19 Feb 2003, Jaroslav Kysela wrote:
> perex@pnote:~> cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 8
> model name : Pentium III (Coppermine)
> stepping: 6
> cpu MHz : 847.473
> cache size : 256 K
Jaroslaw Sobierski wrote:
>
> Quoting Abramo Bagnara <[EMAIL PROTECTED]>:
> >
> > The results are amazing and I'd say Jaroslav has done some mistakes in
> > his handmade asm.
> >
>
> This may be true, but I think you're trying to be a little too quick yourself.
No doubts about that, I was in a h
On Wed, 19 Feb 2003, Abramo Bagnara wrote:
> The results are amazing and I'd say Jaroslav has done some mistakes in
> his handmade asm.
I don't think so. It seems that my brain still remembers assembler ;-)
You passed wrong values to my code so it did unaligned accesses.
Fixes to make things sam
Quoting Abramo Bagnara <[EMAIL PROTECTED]>:
>
> The results are amazing and I'd say Jaroslav has done some mistakes in
> his handmade asm.
>
This may be true, but I think you're trying to be a little too quick yourself.
Did you *test* your code? I only had time to take a short glance at it, but
Abramo Bagnara wrote:
>
> Jaroslav Kysela wrote:
> >
> > On Wed, 19 Feb 2003, Abramo Bagnara wrote:
> >
> > > Jaroslav Kysela wrote:
> > > >
> > > > I've implemented the whole transfer and mix loop in assembly and it works
> > > > without any drastic impact on CPU usage. I tried to optimize the as
Jaroslav Kysela wrote:
>
> On Wed, 19 Feb 2003, Abramo Bagnara wrote:
>
> > Jaroslav Kysela wrote:
> > >
> > > I've implemented the whole transfer and mix loop in assembly and it works
> > > without any drastic impact on CPU usage. I tried to optimize the assembler
> > > part as much as I can, bu
On Wed, 19 Feb 2003, Jaroslaw Sobierski wrote:
> Quoting Jaroslav Kysela <[EMAIL PROTECTED]>:
> >
> > I've implemented the whole transfer and mix loop in assembly and it works
> > without any drastic impact on CPU usage. I tried to optimize the assembler
> > part as much as I can, but if some ass
On Wed, 19 Feb 2003, Abramo Bagnara wrote:
> Jaroslav Kysela wrote:
> >
> > I've implemented the whole transfer and mix loop in assembly and it works
> > without any drastic impact on CPU usage. I tried to optimize the assembler
> > part as much as I can, but if some assembler guru want to give a
Quoting Jaroslav Kysela <[EMAIL PROTECTED]>:
>
> I've implemented the whole transfer and mix loop in assembly and it works
> without any drastic impact on CPU usage. I tried to optimize the assembler
> part as much as I can, but if some assembler guru want to give a glance,
> I'll appreciate it. T
Jaroslav Kysela wrote:
>
> I've implemented the whole transfer and mix loop in assembly and it works
> without any drastic impact on CPU usage. I tried to optimize the assembler
> part as much as I can, but if some assembler guru want to give a glance,
> I'll appreciate it. The function is named m
Paul Davis wrote:
>
> >> Still, don't we already *have* a feeding thread for the sound card? I mean
> >> it doesn't just grab the memory buffer all by itself whenever it wants?
> >
> >Nope. The idea for the dmix plugin is that we share the DMA ring buffer
> >with more threads (processes). There is
On Tue, 18 Feb 2003, Abramo Bagnara wrote:
> Jaroslav Kysela wrote:
> >
> > On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> >
> > > >> I see, the read/saturate/write must be atomic, too. In this case, it would
> > > >> be better to use a global (or a set of) mutex(es) to lock the hardware
> > >
On Tue, 18 Feb 2003, Paul Davis wrote:
> >>v = *src;
> >>if (cmpxchg(hw, 0, 1) == 0)
> >>v -= *sw;
> >> xadd(sw, v);
> >> do {
> >> v = *sw;
> >> if (v > 0x7fff)
> >> s = 0x7fff;
> >> else i
>> v = *src;
>> if (cmpxchg(hw, 0, 1) == 0)
>> v -= *sw;
>> xadd(sw, v);
>> do {
>> v = *sw;
>> if (v > 0x7fff)
>> s = 0x7fff;
>> else if (v < -0x8000)
>> s = -0x80
On Tue, 18 Feb 2003, Jaroslaw Sobierski wrote:
> Quoting Jaroslav Kysela:
> [...]
> > >
> > > v = *src;
> > > if (cmpxchg(hw, 0, 1) == 0)
> > > v -= *sw;
> > > xadd(sw, v);
> > > do {
> > > v = *sw;
> > > if (v > 0x7fff)
> > >
Quoting Jaroslav Kysela:
[...]
> >
> > v = *src;
> > if (cmpxchg(hw, 0, 1) == 0)
> > v -= *sw;
> > xadd(sw, v);
> > do {
> > v = *sw;
> > if (v > 0x7fff)
> > s = 0x7fff;
> > else if (v < -0x
On Tue, 18 Feb 2003, Abramo Bagnara wrote:
> Jaroslav Kysela wrote:
> >
> > On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> >
> > > >> I see, the read/saturate/write must be atomic, too. In this case, it would
> > > >> be better to use a global (or a set of) mutex(es) to lock the hardware
> > >
Jaroslav Kysela wrote:
>
> On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
>
> > >> I see, the read/saturate/write must be atomic, too. In this case, it would
> > >> be better to use a global (or a set of) mutex(es) to lock the hardware
> > >> ring buffer. The futexes are nice.
> > >
> > >They are
>> On Mon, 17 Feb 2003, Jaroslav Kysela wrote:
>>
>> > Note that your all nice ideas go to some blind alley. Who will silence the
>> > sum buffer? Driver silences only hardware buffer which will not be used
>> > for the calculation in your algorithm.
>>
>> Silencing is not time critical, if bu
On Mon, 17 Feb 2003, tomasz motylewski wrote:
> On Mon, 17 Feb 2003, Jaroslav Kysela wrote:
>
> > Note that your all nice ideas go to some blind alley. Who will silence the
> > sum buffer? Driver silences only hardware buffer which will not be used
> > for the calculation in your algorithm.
>
On Mon, 17 Feb 2003, Jaroslav Kysela wrote:
> Note that your all nice ideas go to some blind alley. Who will silence the
> sum buffer? Driver silences only hardware buffer which will not be used
> for the calculation in your algorithm.
Silencing is not time critical, if buffer is big enough it
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> >> I see, the read/saturate/write must be atomic, too. In this case, it would
> >> be better to use a global (or a set of) mutex(es) to lock the hardware
> >> ring buffer. The futexes are nice.
> >
> >They are nice indeed, but definitely not the rig
>> Still, don't we already *have* a feeding thread for the sound card? I mean
>> it doesn't just grab the memory buffer all by itself whenever it wants?
>
>Nope. The idea for the dmix plugin is that we share the DMA ring buffer
>with more threads (processes). There is no "master" thread which oper
>
>Well, but when adding a+b we have no idea that that overlow will be compensated
>by next very big negative sample. Also mixing signals which already fill 90% of
>dynamic range is not a good idea. My "fix" is heuristic - it works for
>occasional _small_ overflows like 0x4100+0x4000 -> 0x7fff is m
>> I see, the read/saturate/write must be atomic, too. In this case, it would
>> be better to use a global (or a set of) mutex(es) to lock the hardware
>> ring buffer. The futexes are nice.
>
>They are nice indeed, but definitely not the right solution here.
>
>Although I don't know if it's the abs
Jaroslav Kysela wrote:
>
> On Mon, 17 Feb 2003, Abramo Bagnara wrote:
>
> > You're wrong: xadd is atomic but xadd/read/saturation/write is not.
> >
> > Without the loop I've added you risk to write on hw_ring_buffer an *old*
> > value:
> >
> > A:B:
> > xadd
> > read
> >
On Mon, 17 Feb 2003, Abramo Bagnara wrote:
> You're wrong: xadd is atomic but xadd/read/saturation/write is not.
>
> Without the loop I've added you risk to write on hw_ring_buffer an *old*
> value:
>
> A:B:
> xadd
> read
> xadd
> read
> satu
Jaroslav Kysela wrote:
>
> On Mon, 17 Feb 2003, Abramo Bagnara wrote:
>
> > Jaroslav Kysela wrote:
> > >
> > > On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> > >
> > > > > > b) sum overflow: we can lower volume of samples before sum; I think that
> > > > > >hardware works in this way, too
>
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> Does the problem lie in the fact that it is actually a plugin and has
> no control of the transfer? Maybe it would be worth considering a callback
> for the plugin from the main alsa module to infrom it that a new piece
> of the DMA buffer must be "
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> Still, don't we already *have* a feeding thread for the sound card? I mean
> it doesn't just grab the memory buffer all by itself whenever it wants?
Nope. The idea for the dmix plugin is that we share the DMA ring buffer
with more threads (process
Abramo Bagnara wrote:
>If we'd need to use an intermediate buffer and a mixing thread, the dmix
>approach lose our interest.
>
>A solution might be to have a shared parallel sw ring buffer where to
>store the exact value:
>
>xadd(sw, *src);
> do {
> v = *sw;
>
On Mon, 17 Feb 2003, Abramo Bagnara wrote:
> Jaroslav Kysela wrote:
> >
> > On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> >
> > > > > b) sum overflow: we can lower volume of samples before sum; I think that
> > > > >hardware works in this way, too
> > > >
> > > > Here I don't understand y
Jaroslav Kysela wrote:
>
> On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
>
> > > > b) sum overflow: we can lower volume of samples before sum; I think that
> > > >hardware works in this way, too
> > >
> > > Here I don't understand you. Suppose we have 3 samples to mix:
> > > a = 0x7500
> > >
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> >> In our case, such "solution" would have to affect the whole buffer, meaning
> >> we would need 3 (or better yet 4) bytes per sample, which would eventually get
> >> reduced back to 2 bytes on the way out to the sound card. This seems neither
> >
>> In our case, such "solution" would have to affect the whole buffer, meaning
>> we would need 3 (or better yet 4) bytes per sample, which would eventually get
>> reduced back to 2 bytes on the way out to the sound card. This seems neither
>> elegant nor memory efficient but would work, and also
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> > Here I don't understand you. Suppose we have 3 samples to mix:
> > a = 0x7500
> > b = 0x7400
> > c = 0x8300
> >
> > If you do a + b + c (in this order) you obtain:
> > d=0
> > d += a -> 7500
> > d += b -> 0xe900 -> 0x7fff
> > d += c -> 0x02ff
> >
On Mon, 17 Feb 2003, Jaroslaw Sobierski wrote:
> > > b) sum overflow: we can lower volume of samples before sum; I think that
> > >hardware works in this way, too
> >
> > Here I don't understand you. Suppose we have 3 samples to mix:
> > a = 0x7500
> > b = 0x7400
> > c = 0x8300
> >
> > If yo
> > b) sum overflow: we can lower volume of samples before sum; I think that
> >hardware works in this way, too
>
> Here I don't understand you. Suppose we have 3 samples to mix:
> a = 0x7500
> b = 0x7400
> c = 0x8300
>
> If you do a + b + c (in this order) you obtain:
> d=0
> d += a -> 7500
51 matches
Mail list logo