Re: [flac-dev] Performance checks pre-release

2014-11-26 Thread Erik de Castro Lopo
Martijn van Beurden wrote:

 For anyone wondering, here's a PDF comparing encoding speed, 
 decoding speed and compression between FLAC 1.2.1, 1.3.0 and 
 1.3.1pre1.
 
 Compiles on a Intel Core 2 Duo T9600 (SSE up to and including 
 4.1, no AVX), Kubuntu 14.04.1, GCC 4.9.1.

Awesome! Thanks Martijn!

Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


[flac-dev] Performance checks pre-release

2014-11-25 Thread Martijn van Beurden
For anyone wondering, here's a PDF comparing encoding speed, 
decoding speed and compression between FLAC 1.2.1, 1.3.0 and 
1.3.1pre1.


Compiles on a Intel Core 2 Duo T9600 (SSE up to and including 
4.1, no AVX), Kubuntu 14.04.1, GCC 4.9.1.


long set of samples-1.3.1pre1.pdf
Description: Adobe PDF document
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2014-07-03 Thread Miroslav Lichvar
On Wed, Jul 02, 2014 at 10:18:58PM +0200, Martijn van Beurden wrote:
 http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Linux-GCC-4.8.pdf
 http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Wine-MSVC-2013.pdf
 
 For the GCC 4.8 results, there is a*very nice 60% to 70% speed increase*
 when encoding with preset -8 between FLAC 1.3.0 and current git.

That's indeed a significant improvement. I'm assuming most of it is
from the SSE intrinsic code. Good job, lvqcl!

-- 
Miroslav Lichvar
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


[flac-dev] Performance checks

2014-07-02 Thread Martijn van Beurden

Hi all,

I thought it was a good idea to get an overview of the 
developments since the release of 1.3.0, so here are a few graphs.


The first was compiled with GCC 4.8, the second was compiled 
with MSVC 2013. Both were tested on a Kubuntu 14.04 machine, 
with an Intel Core 2 Duo T9600 (SSE support up to version 
4.1), the MSVC compiles were run through wine, as I don't think 
running in a virtual machine would give reproducible results. 
However, because of the use of Wine the MSVC results are not 
more than an indication.


http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Linux-GCC-4.8.pdf
http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Wine-MSVC-2013.pdf

For the GCC 4.8 results, there is a*very nice 60% to 70% speed 
increase* when encoding with preset -8 between FLAC 1.3.0 and 
current git.




___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


[flac-dev] Performance checks

2013-12-19 Thread Martijn van Beurden

Hi all,

I had some time to spare so I made a comparison of current git versus 
the FLAC 1.3.0 release considering encoding and decoding speed. This was 
done with GCC 4.7.3 for AMD64 linux.


There's a very nice speedup visible. Keep up the good work!


FLAC 1.3.0 versus git a6a4b6f.pdf
Description: Adobe PDF document
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-06-03 Thread Janne Hyvärinen
On 3.6.2013 14:24, Miroslav Lichvar wrote:
 On Sat, Jun 01, 2013 at 02:33:55PM +0300, Janne Hyvärinen wrote:
 On 1.6.2013 14:24, Janne Hyvärinen wrote:
 I can confirm. I see 10% speed improvement with that change on Core i7.
 Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with
 normal asm optimizations 7.656s (speed: 616,266x realtime) and with that
 tiny change 6.937s (speed: 680,140x realtime).
 Thanks for the testing.

 I noticed a side effect for this change. Encoding got a bit slower at
 least when md5 checksumming is enabled.
 That's odd. How much slower was the encoding? Could it be caused by
 increase in the size of the function (only with -funroll-loops?) and
 not fitting in the cache during encoding?

 It might be good to use -funroll-loops only with some files, IIRC it
 helped most to stream_encoder.c.


I neglected to mention that the testing was done with MSVC 2012 and on 
Windows.
I did some futher testing after your mail and noticed that with GCC the 
encoding speed is unaffected. Decoding speed increase is not as big as 
with MSVC, only 7% improvement with it.

With MSVC the drop in encoding speed with my test file is 0.4%.

Other perhaps interesting speed results:
MSVC compile with unaltered sources is 1.9% faster than GCC at encoding.
GCC decoding is 8% faster than MSVC before the modification and 5.6% 
after the modification.
These results are without changing any compiling options on either compiler.

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-06-03 Thread Janne Hyvärinen

On 3.6.2013 14:24, Miroslav Lichvar wrote:

On Sat, Jun 01, 2013 at 02:33:55PM +0300, Janne Hyvärinen wrote:

On 1.6.2013 14:24, Janne Hyvärinen wrote:

I can confirm. I see 10% speed improvement with that change on Core i7.
Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with
normal asm optimizations 7.656s (speed: 616,266x realtime) and with 
that

tiny change 6.937s (speed: 680,140x realtime).

Thanks for the testing.


I noticed a side effect for this change. Encoding got a bit slower at
least when md5 checksumming is enabled.

That's odd. How much slower was the encoding? Could it be caused by
increase in the size of the function (only with -funroll-loops?) and
not fitting in the cache during encoding?

It might be good to use -funroll-loops only with some files, IIRC it
helped most to stream_encoder.c.



I neglected to mention that the testing was done with MSVC 2012 and on 
Windows.
I did some futher testing after your mail and noticed that with GCC the 
encoding speed is unaffected. Decoding speed increase is not as big as 
with MSVC, only 7% improvement with it.


With MSVC the drop in encoding speed with my test file is 0.4%.

Other perhaps interesting speed results:
MSVC compile with unaltered sources is 1.9% faster than GCC at encoding.
GCC decoding is 8% faster than MSVC before the modification and 5.6% 
after the modification.

These results are without changing any compiling options on either compiler.

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-06-01 Thread Janne Hyvärinen
On 31.5.2013 13:04, Miroslav Lichvar wrote:
 On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote:
 I was surprised to see that the Windows compile on wine actually
 outperformed the native Linux one. Probably GCC 4.6 optimized a little
 better or something very weird is going on in wine, I don't know. The
 assembly optimizations work very well on encoding, but actually slow
 things down when decoding. The difference is not very large however.
 In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small
 improvement in decoding speed with assembly optimizations turned on,
 but I think the difference used to be larger. Perhaps the compilers
 got better or MMX is slower relative to normal code on current CPUs.

 Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap
 function seems to help a bit. (there is an #if disabling the function
 with comment OPT: not clearly faster, needs more testing in the
 src/libFLAC/stream_decoder.c file)

 Here is the relative decoding speed with -5 and -8:
   -5  -8
 no asm99.0%   97.0%
 asm   100.0%  100.0%
 asm (no ia32_bswap)   102.7%  102.7%

 I think we should drop that assembly function as the C
 version seems to be faster now.

 Can anyone confirm this?

 Thanks,


I can confirm. I see 10% speed improvement with that change on Core i7.
Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with 
normal asm optimizations 7.656s (speed: 616,266x realtime) and with that 
tiny change 6.937s (speed: 680,140x realtime).

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-06-01 Thread Janne Hyvärinen
On 1.6.2013 14:24, Janne Hyvärinen wrote:
 On 31.5.2013 13:04, Miroslav Lichvar wrote:
 On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote:
 I was surprised to see that the Windows compile on wine actually
 outperformed the native Linux one. Probably GCC 4.6 optimized a little
 better or something very weird is going on in wine, I don't know. The
 assembly optimizations work very well on encoding, but actually slow
 things down when decoding. The difference is not very large however.
 In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small
 improvement in decoding speed with assembly optimizations turned on,
 but I think the difference used to be larger. Perhaps the compilers
 got better or MMX is slower relative to normal code on current CPUs.

 Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap
 function seems to help a bit. (there is an #if disabling the function
 with comment OPT: not clearly faster, needs more testing in the
 src/libFLAC/stream_decoder.c file)

 Here is the relative decoding speed with -5 and -8:
  -5  -8
 no asm   99.0%   97.0%
 asm  100.0%  100.0%
 asm (no ia32_bswap)  102.7%  102.7%

 I think we should drop that assembly function as the C
 version seems to be faster now.

 Can anyone confirm this?

 Thanks,

 I can confirm. I see 10% speed improvement with that change on Core i7.
 Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with
 normal asm optimizations 7.656s (speed: 616,266x realtime) and with that
 tiny change 6.937s (speed: 680,140x realtime).



I noticed a side effect for this change. Encoding got a bit slower at 
least when md5 checksumming is enabled.

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-05-31 Thread Miroslav Lichvar
On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote:
 I was surprised to see that the Windows compile on wine actually 
 outperformed the native Linux one. Probably GCC 4.6 optimized a little 
 better or something very weird is going on in wine, I don't know. The 
 assembly optimizations work very well on encoding, but actually slow 
 things down when decoding. The difference is not very large however.

In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small
improvement in decoding speed with assembly optimizations turned on,
but I think the difference used to be larger. Perhaps the compilers
got better or MMX is slower relative to normal code on current CPUs.

Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap
function seems to help a bit. (there is an #if disabling the function
with comment OPT: not clearly faster, needs more testing in the
src/libFLAC/stream_decoder.c file)

Here is the relative decoding speed with -5 and -8:
-5  -8
no asm  99.0%   97.0%
asm 100.0%  100.0%
asm (no ia32_bswap) 102.7%  102.7%

I think we should drop that assembly function as the C
version seems to be faster now.

Can anyone confirm this?

Thanks,

-- 
Miroslav Lichvar
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-05-29 Thread Martijn van Beurden
On 28-05-13 20:09, Janne Hyvärinen wrote:
 On Windows the 32-bit NASM enabled compiles are always fastest. If you 
 can run 32-bit code on your Linux box you should compile with assembly 
 optimizations.

That depends on the way you define speed. For decoding this doesn't seem 
to be true. I reran my tests, it took a little longer because I couldn't 
believe the results I got. However, they are perfectly reproducible (on 
my system at least), so I guess I'll have to believe them.

In the linked PDFs is first a test with the average of 5 CDs and second 
the graph of only one of those 5. It is clearly visible that the 'speed 
ranking' for each compression setting match very closely, so the 
accuracy is probably pretty high. I did this comparison on Kubuntu 12.10 
64-bit.

http://www.icer.nl/misc_stuff/All tracks.pdf
http://www.icer.nl/misc_stuff/Coldplay - Parachutes.pdf

I was surprised to see that the Windows compile on wine actually 
outperformed the native Linux one. Probably GCC 4.6 optimized a little 
better or something very weird is going on in wine, I don't know. The 
assembly optimizations work very well on encoding, but actually slow 
things down when decoding. The difference is not very large however.

Anyway, I think I'm convinced now that my lossless codec comparison was 
valid and I can keep running codecs through wine. I should probably run 
all of them through wine just for the sake of clarity.

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-05-28 Thread Miroslav Lichvar
On Tue, May 28, 2013 at 07:17:58PM +0200, Martijn van Beurden wrote:
 I was doing some checks in preparation of updating the comparison on
 the FLAC page this summer and I thought the results might be
 interesting for people on the dev list as well.

I'm always interested in performance tests :).

 The performance of the minGW-w64 build (through wine) and the native
 Linux 64-bit build is similar, so I guess my original question is
 answered: wine doesn't affect performance. However, I tried quite a
 few things building a 32-bit binary on my Linux system, but they are
 all *very* slow. Does anyone know why? I ran ./configure
 --build=i686-pc-linux-gnu CFLAGS='-m32' CPPFLAGS='-m32'
 LDFLAGS='-m32' and tried a bunch of other things. Any thoughts?

I think if you are setting CFLAGS you need to include also the
optimizations flags, e.g. -m32 -O3 -funroll-loops to match the
default CFLAGS.

Thanks,

-- 
Miroslav Lichvar
___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev


Re: [flac-dev] Performance checks

2013-05-28 Thread Janne Hyvärinen
On 28.5.2013 21:06, Martijn van Beurden wrote:
 On 28-05-13 19:38, Miroslav Lichvar wrote:
 I'm always interested in performance tests :).
 In that case I hope you saw the previous one, because the decoding
 speed-up was credited to be one of your patches, according to some
 people over at HydrogenAudio:
 http://lists.xiph.org/pipermail/flac-dev/2013-March/003856.html

 Really, great stuff ;)

 I think if you are setting CFLAGS you need to include also the
 optimizations flags, e.g. -m32 -O3 -funroll-loops to match the
 default CFLAGS.
 Oh, I thought I was appending flags. Anyway, thanks, that solved the
 problem. I didn't know compiler flags could make such an enormous
 difference. It will take me a few hours to rerun the test however.

 ___
 flac-dev mailing list
 flac-dev@xiph.org
 http://lists.xiph.org/mailman/listinfo/flac-dev


On Windows the 32-bit NASM enabled compiles are always fastest. If you 
can run 32-bit code on your Linux box you should compile with assembly 
optimizations.

___
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev