[Freerdp-devel] Update on SSE2 for RemoteFX

2011-06-15 Thread S. Erisman
I finished adding SSE2 optimizations for the Inverse DWT decoding routines this evening. Here are the current performance numbers from my Atom D510 test system: Without SSE: |---| PROFILER |

[Freerdp-devel] Update on SSE2 for RemoteFX

2011-06-15 Thread S. Erisman
I finished adding SSE2 optimizations for the Inverse DWT decoding routines this evening. Here are the current performance numbers from my Atom D510 test system: Without SSE: |---| PROFILER |

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
Hey Vic, On 6/10/2011 12:32 AM, Vic Lee wrote: Hi Steve, Yes both is faster, but the SSE version is still quite slower than the original one. Here is my testing. Before pulling: | rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.75 | 0.000824 | | rfx_decode_YCbCr_to_RGB |

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
Vic, On 6/10/2011 4:16 AM, Martin Fleisz wrote: I am not quite sure how internally those _mm_* functions work, but if those are really functions, it will definitely hurt the performance. I think use assembly SSE2 instruction set directly (like paddw) should be much better. Vic The _mm_*

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
On 6/10/2011 10:59 AM, S. Erisman wrote: The _mm_* function _do_ indeed get compiled down to SSE assembly instructions. For reference... Here is what the non-SSE code compiles down too: rfx_decode_YCbCr_to_RGB(): 0:55 push %ebp 1:31 d2

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-10 Thread S. Erisman
, S. Erisman wrote: Hey Vic, On 6/10/2011 12:32 AM, Vic Lee wrote: Hi Steve, Yes both is faster, but the SSE version is still quite slower than the original one. Here is my testing. Before pulling: | rfx_decode_YCbCr_to_RGB_SSE2 | 2123 | 1.75 | 0.000824

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
Martin, On 6/9/2011 7:09 AM, Martin Fleisz wrote: One thing that will definitely hurt performance is if our memory is not 16-byte aligned. We should also have a possibility to overload the memory allocation in rfx_pool to use _mm_malloc/_mm_free to have correctly aligned buffers. We should

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
Vic, On 6/9/2011 10:05 PM, Vic Lee wrote: Hi Steve, The RemoteFX algorithm does not specify the minimum required bits, butt according to a forum post in MSDN, MS's implementation use 16bit signed integer, so I believe it should be enough. Thanks for the response. I actually found my

Re: [Freerdp-devel] RemoteFX Profiler: First results...

2011-06-09 Thread S. Erisman
On 6/9/2011 10:04 PM, S. Erisman wrote: Vic, On 6/9/2011 10:05 PM, Vic Lee wrote: Hi Steve, The RemoteFX algorithm does not specify the minimum required bits, butt according to a forum post in MSDN, MS's implementation use 16bit signed integer, so I believe it should be enough. Thanks

Re: [Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-08 Thread S. Erisman
Marc, On 6/7/2011 11:35 PM, Marc-André Moreau wrote: Hi Steve, I just tried your patch - awesome! Thanks. That was the first SSE code I have ever written and it ended up being pretty easy. Once we have high level agreement on the structure needed around these optimizations there is

Re: [Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-08 Thread S. Erisman
Marc, I took your suggestions into account, revised my earlier patch, and committed my changes to a new fork: https://github.com/serisman/FreeRDP ... more comments below ... On 6/7/2011 9:29 PM, Marc-André Moreau wrote: Hi Steve, Well, that was fast :) I had started thinking of the

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-07 Thread S. Erisman
Marc, I vote to merge your github fork. I tried it out last night, and it seems pretty stable. Could you also include a fix for the fullscreen toggle (while using RemoteFX) issue that I sent to the list the other day? It should be as simple as clearing or resetting the clip region at the

[Freerdp-devel] RemoteFX SSE/SSE2 decoding (was RemoteFX software decoding)

2011-06-07 Thread S. Erisman
Marc, On 6/6/2011 9:20 AM, Marc-André Moreau wrote: I read more about SSE, and then about NEON which is the equivalent for ARM My first impression is damn, how could I not see this before? This thing looks very well suited not only for acceleration of RemoteFX decoding, but there's a chance

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-07 Thread S. Erisman
Vic, On 6/7/2011 7:18 PM, Vic Lee wrote: Hi Steve, I think it looks like it might be not just affecting fullscreen toggling only (depending on the window manager I guess it might happen other cases). This patch should fix it more properly. diff --git a/X11/xf_decode.c b/X11/xf_decode.c

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-05 Thread S. Erisman
On 5/25/2011 8:42 PM, Vic Lee wrote: Hi, I have finally completed RemoteFX software decoding feature. It's writen as a separate and relatively independent library librfx. I only added it to xfreerdp, but the library is portable, so there shouldn't be problem to use it in other UI. I have

Re: [Freerdp-devel] RemoteFX software decoding

2011-06-05 Thread S. Erisman
On 6/5/2011 9:50 PM, Otavio Salvador wrote: On Mon, Jun 6, 2011 at 02:25, S. Erismanseris...@serisman.com wrote: I tried out your RemoteFX code over the weekend, and it works very nicely. It didn't work for me. How did you configure the Windows Server to use it? What worries me is that the

Re: [Freerdp-devel] Google Summer of Code 2011

2011-03-02 Thread S. Erisman
On 3/2/2011 8:13 AM, Marc-André Moreau wrote: By regular hardware, do you mean hardware that does not include the special RemoteFX chip? Adding RemoteFX support without the chip means implementing the codec in software, and that is way too much for a student to do in a summer. I have