I finished adding SSE2 optimizations for the Inverse DWT decoding
routines this evening.
Here are the current performance numbers from my Atom D510 test system:
Without SSE:
|---|
PROFILER |
I finished adding SSE2 optimizations for the Inverse DWT decoding
routines this evening.
Here are the current performance numbers from my Atom D510 test system:
Without SSE:
|---|
PROFILER |
Hey Vic,
On 6/10/2011 12:32 AM, Vic Lee wrote:
Hi Steve,
Yes both is faster, but the SSE version is still quite slower than the
original one. Here is my testing.
Before pulling:
| rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.75 | 0.000824 |
| rfx_decode_YCbCr_to_RGB |
Vic,
On 6/10/2011 4:16 AM, Martin Fleisz wrote:
I am not quite sure how internally those _mm_* functions work, but if
those are really functions, it will definitely hurt the performance. I
think use assembly SSE2 instruction set directly (like paddw) should be
much better.
Vic
The _mm_*
On 6/10/2011 10:59 AM, S. Erisman wrote:
The _mm_* function _do_ indeed get compiled down to SSE assembly
instructions.
For reference... Here is what the non-SSE code compiles down too:
rfx_decode_YCbCr_to_RGB():
0:55 push %ebp
1:31 d2
, S. Erisman wrote:
Hey Vic,
On 6/10/2011 12:32 AM, Vic Lee wrote:
Hi Steve,
Yes both is faster, but the SSE version is still quite slower than the
original one. Here is my testing.
Before pulling:
| rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.75 | 0.000824
Martin,
On 6/9/2011 7:09 AM, Martin Fleisz wrote:
One thing that will definitely hurt performance is if our memory is
not 16-byte aligned. We should also have a possibility to overload the
memory allocation in rfx_pool to use _mm_malloc/_mm_free to have
correctly aligned buffers.
We should
Vic,
On 6/9/2011 10:05 PM, Vic Lee wrote:
Hi Steve,
The RemoteFX algorithm does not specify the minimum required bits, butt
according to a forum post in MSDN, MS's implementation use 16bit signed
integer, so I believe it should be enough.
Thanks for the response. I actually found my
On 6/9/2011 10:04 PM, S. Erisman wrote:
Vic,
On 6/9/2011 10:05 PM, Vic Lee wrote:
Hi Steve,
The RemoteFX algorithm does not specify the minimum required bits, butt
according to a forum post in MSDN, MS's implementation use 16bit signed
integer, so I believe it should be enough.
Thanks
Marc,
On 6/7/2011 11:35 PM, Marc-André Moreau wrote:
Hi Steve,
I just tried your patch - awesome!
Thanks. That was the first SSE code I have ever written and it ended up
being pretty easy. Once we have high level agreement on the structure
needed around these optimizations there is
Marc,
I took your suggestions into account, revised my earlier patch, and
committed my changes to a new fork:
https://github.com/serisman/FreeRDP
... more comments below ...
On 6/7/2011 9:29 PM, Marc-André Moreau wrote:
Hi Steve,
Well, that was fast :) I had started thinking of the
Marc,
I vote to merge your github fork. I tried it out last night, and it
seems pretty stable.
Could you also include a fix for the fullscreen toggle (while using
RemoteFX) issue that I sent to the list the other day? It should be as
simple as clearing or resetting the clip region at the
Marc,
On 6/6/2011 9:20 AM, Marc-André Moreau wrote:
I read more about SSE, and then about NEON which is the equivalent for
ARM
My first impression is damn, how could I not see this before? This
thing looks very well suited not only for acceleration of RemoteFX
decoding, but there's a chance
Vic,
On 6/7/2011 7:18 PM, Vic Lee wrote:
Hi Steve,
I think it looks like it might be not just affecting fullscreen
toggling only (depending on the window manager I guess it might happen
other cases). This patch should fix it more properly.
diff --git a/X11/xf_decode.c b/X11/xf_decode.c
On 5/25/2011 8:42 PM, Vic Lee wrote:
Hi,
I have finally completed RemoteFX software decoding feature. It's writen
as a separate and relatively independent library librfx. I only added it
to xfreerdp, but the library is portable, so there shouldn't be problem
to use it in other UI.
I have
On 6/5/2011 9:50 PM, Otavio Salvador wrote:
On Mon, Jun 6, 2011 at 02:25, S. Erismanseris...@serisman.com wrote:
I tried out your RemoteFX code over the weekend, and it works very nicely.
It didn't work for me. How did you configure the Windows Server to use
it? What worries me is that the
On 3/2/2011 8:13 AM, Marc-André Moreau wrote:
By regular hardware, do you mean hardware that does not include the
special RemoteFX chip? Adding RemoteFX support without the chip means
implementing the codec in software, and that is way too much for a
student to do in a summer. I have
17 matches
Mail list logo