Re: [flac-dev] 64-bit residuals
On Tue, Apr 05, 2022 at 11:41:12AM +0200, Martijn van Beurden wrote: > Extrapolating from these results, it seems unlikely that increasing > the residual limit beyond a 32-bit int would bring significant > compression improvement for music sources. Interesting. I wasn't expecting that. I tried your code with some 16-bit audio expanded with 16 bits of noise and I don't see a large difference here either. Thanks for you work on this. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] 64-bit residuals
On Wed, Mar 30, 2022 at 04:02:11PM +0200, Martijn van Beurden wrote: > Granted, it won't provide the best possible compression, but 32-bit > PCM is itself already an extremely inefficient format. So I think the > choice here is between slightly better compression and a simpler > encoder versus backward compatibility with a lot of decoders and no > added decoder complexity. Can you post some examples of the impact on compression ratio? Do the samples actually use all of the 32 bits? I have no experince with this. If compression of 16-bit audio was limited to 16-bit residuals, I suspect that would have a significant impact, so I'm wondering how it is different with 32-bit audio. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] 64-bit residuals
On Tue, Mar 29, 2022 at 04:57:10PM +0200, Martijn van Beurden wrote: > Op di 29 mrt. 2022 om 11:43 schreef Miroslav Lichvar : > > The third option makes most sense to me. I don't think we should > > complicate decoders by requiring them to support 64-bit residuals only > > because it's technically possible to encode such a stream. > > Would you argue this limitation is imposed on all possible FLAC > streams, or just for PCM inputs with a bit depth up to and including > 24? Or should this also apply to 32-bit streams? I think that should apply only to <=24 bits. If I understand it correctly, with 32 bits it would be a real limitation, not hit only with specifically crafted encodings. > A similar problem in the spec represents itself with 32-bit PCM > inputs. When applying stereo decorrelation, a transformation to side a > subframe bps of 33, which is very inconvenient. Should stereo > decorrelation be forbidden for 32 bps inputs? No, that doesn't seem acceptable to me. > While these distinctions might seem unimportant and 32-bit streams > folly, there is currently an effort underway to make FLAC and IETF RFC > standard. I think that the decoder of the reference implementation > (libFLAC) should support all features the format has before the > standard becomes final. As the FLAC format has always supported 32 bps > streams but no encoder for these exists, I think this requires some > extra care. The intended status of the current FLAC draft is informational, so there doesn't have to be any existing implementation that supports all bit depths, right? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] 64-bit residuals
On Thu, Mar 24, 2022 at 05:05:02PM +0100, Martijn van Beurden wrote: > 3) Add a note to the FLAC spec that residuals should not exceed > 32-bit, and add a mechanism to the encoder to ascertain this. In case > the encoder finds a residual exceeding the 32-bit range, it defaults > to using a verbatim subframe. > > Any thoughts? The third option makes most sense to me. I don't think we should complicate decoders by requiring them to support 64-bit residuals only because it's technically possible to encode such a stream. Decoders can produce wrong output or reject the stream as invalid, but they should not crash. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] FLAC frame boundaries and protocol
On Tue, Mar 05, 2019 at 10:26:54PM -0800, Brian Willoughby wrote: > Frames start with a 14-bit sync code, which is 13 “one" bits and 1 “zero" > bit. Subframes start with a 1-bit padding of “zero." Keep in mind that FLAC > is a bit stream, not a byte stream, so that 14-bit frame sync can happen > anywhere in a pair of bytes. You can’t simply scan memory bytes for a frame > sync, at least not unless you allow for 8 variations, apply bit masks, and > allow for non-word-address-aligned matches. The specification says frames are aligned to bytes: Following the frame header are encoded subframes, one for each channel, and finally, the frame is zero-padded to a byte boundary. The problem with finding frame boundaries is that the sync code and the 8-bit CRC are too short, so the whole frame needs to be parsed and the 16-bit CRC checked to be reasonably sure it really is a FLAC frame. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Behavior of safe_realloc_add_2op_()
On Wed, Jul 18, 2018 at 12:30:41PM +0200, Miroslav Lichvar wrote: > Should safe_realloc_add_2op_() be > changed to use safe_realloc_() instead of realloc()? Is there any code > in flac that relies on the current behavior? It does indeed look like some code that (indirectly) uses the safe_realloc_*() functions relies on the pointer not being freed. The reallocation errors are not handled and propagated back, so the pointers that would be freed might be dereferenced again. Please ignore the patches I sent. The callers need to be fixed too. This will require a careful review of a lot of code. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 3/2] Free memory on errors in all safe_realloc_*() functions
--- include/share/alloc.h | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/include/share/alloc.h b/include/share/alloc.h index 63878db0..97752f0c 100644 --- a/include/share/alloc.h +++ b/include/share/alloc.h @@ -174,34 +174,46 @@ static inline void *safe_realloc_add_2op_(void *ptr, size_t size1, size_t size2) static inline void *safe_realloc_add_3op_(void *ptr, size_t size1, size_t size2, size_t size3) { size2 += size1; - if(size2 < size1) + if(size2 < size1) { + free(ptr); return 0; + } size3 += size2; - if(size3 < size2) + if(size3 < size2) { + free(ptr); return 0; - return realloc(ptr, size3); + } + return safe_realloc_(ptr, size3); } static inline void *safe_realloc_add_4op_(void *ptr, size_t size1, size_t size2, size_t size3, size_t size4) { size2 += size1; - if(size2 < size1) + if(size2 < size1) { + free(ptr); return 0; + } size3 += size2; - if(size3 < size2) + if(size3 < size2) { + free(ptr); return 0; + } size4 += size3; - if(size4 < size3) + if(size4 < size3) { + free(ptr); return 0; - return realloc(ptr, size4); + } + return safe_realloc_(ptr, size4); } static inline void *safe_realloc_mul_2op_(void *ptr, size_t size1, size_t size2) { if(!size1 || !size2) - return realloc(ptr, 0); /* preserve POSIX realloc(ptr, 0) semantics */ - if(size1 > SIZE_MAX / size2) + return safe_realloc_(ptr, 0); /* preserve POSIX realloc(ptr, 0) semantics */ + if(size1 > SIZE_MAX / size2) { + free(ptr); return 0; + } return safe_realloc_(ptr, size1*size2); } @@ -209,10 +221,12 @@ static inline void *safe_realloc_mul_2op_(void *ptr, size_t size1, size_t size2) static inline void *safe_realloc_muladd2_(void *ptr, size_t size1, size_t size2, size_t size3) { if(!size1 || (!size2 && !size3)) - return realloc(ptr, 0); /* preserve POSIX realloc(ptr, 0) semantics */ + return safe_realloc_(ptr, 0); /* preserve POSIX realloc(ptr, 0) semantics */ size2 += size3; - if(size2 < size3) + if(size2 < size3) { + free(ptr); return 0; + } return safe_realloc_mul_2op_(ptr, size1, size2); } -- 2.17.1 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 2/2] Fix safe_realloc_add_2op_() to free memory when reallocation fails
--- include/share/alloc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/share/alloc.h b/include/share/alloc.h index 914de9ba..63878db0 100644 --- a/include/share/alloc.h +++ b/include/share/alloc.h @@ -168,7 +168,7 @@ static inline void *safe_realloc_add_2op_(void *ptr, size_t size1, size_t size2) free(ptr); return 0; } - return realloc(ptr, size2); + return safe_realloc_(ptr, size2); } static inline void *safe_realloc_add_3op_(void *ptr, size_t size1, size_t size2, size_t size3) -- 2.17.1 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 1/2] Avoid double free in iconvert()
When safe_realloc_add_2op_(utfbuf, ...) is called with an invalid size and returns 0, set utfbuf to 0 to avoid second deallocation later in the function. --- src/share/utf8/iconvert.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/share/utf8/iconvert.c b/src/share/utf8/iconvert.c index 472ca876..03068ac9 100644 --- a/src/share/utf8/iconvert.c +++ b/src/share/utf8/iconvert.c @@ -150,8 +150,10 @@ int iconvert(const char *fromcode, const char *tocode, return ret; } newbuf = safe_realloc_add_2op_(utfbuf, (ob - utfbuf), /*+*/1); -if (!newbuf) +if (!newbuf) { + utfbuf = 0; goto fail; +} ob = (ob - utfbuf) + newbuf; *ob = '\0'; *to = newbuf; -- 2.17.1 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] Behavior of safe_realloc_add_2op_()
I'm looking at an issue reported by the Coverity static analyzer. In iconvert() in src/share/utf8/iconvert.c on line 152 there is newbuf = safe_realloc_add_2op_(utfbuf, ...); If the request size is not valid, the function will free utfbuf and return 0. This is followed by goto fail and utfbuf is freed for the second time. A simply fix would be to set utfbuf to 0 if newbuf is 0. However, this would create a leak in the case when the size is ok, but the reallocation itself failed. Should safe_realloc_add_2op_() be changed to use safe_realloc_() instead of realloc()? Is there any code in flac that relies on the current behavior? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] about word size in bitreader/bitwriter
On Sun, Dec 20, 2015 at 01:30:57PM +0300, lvqcl wrote: > Erik de Castro Lopo wrote: > > > The think in and ideal world we would a: > > > > * Make it work correctly FLAC__BYTES_PER_WORD == 8 and compare the > > performance > > with FLAC__BYTES_PER_WORD == 4. > > * If there is an statistically measurable performance, keep it, otherwise > > remove the FLAC__BYTES_PER_WORD == 8 code all together. > > I'll try to do it, but I don't have a deep understanding of bit(read|write) > routines such as FLAC__bitreader_read_rice_signed_block() and other. > Maybe Miroslav Lichvar can say something? Using larger registers in the bitreader/bitwriter should reduce the number of instructions needed when reading/writing the encoded stream as fewer words need to be read/written and encoded values are less likely to be split across two words. In the measurements comparing 32-bit and 64-bit words you posted later, I'm wondering why there is a slow down in decoding of 24-bit files. Does profiling show it comes from the read_rice_signed_block() function or something else? Could it be slower SWAP_BE_WORD_TO_HOST or COUNT_ZERO_MSBS2? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] About a comment in stream_decoder.c
On Mon, Apr 20, 2015 at 08:00:20PM +0300, lvqcl wrote: I don't understand the comment in src/libFLAC/stream_decoder.c: /*@@ technically not pessimistic enough, should be more like if( (FLAC__uint64)order * FLAC__uint64)1)bps)-1) * ((1subframe-qlp_coeff_precision)-1) (((FLAC__uint64)-1) 32) ) */ if(bps + subframe-qlp_coeff_precision + FLAC__bitmath_ilog2(order) = 32) which is equivalent to the current bps + subframe-qlp_coeff_precision + FLAC__bitmath_ilog2(order) = 32 So IMHO the comment is incorrect. Yeah, the current code looks right to me. I think we already discussed this some time ago. I'd just remove that comment. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Two new CVEs against FLAC
On Thu, Dec 11, 2014 at 04:50:24PM +0100, Martijn van Beurden wrote: 2014-12-11 14:34 GMT+01:00 Miroslav Lichvar mlich...@redhat.com: So the problem is that FLAC__stream_decoder_process_single returns error before it finds a valid frame? I'm not sure whether we mean the same thing, but I think the problem is that seek_to_absolute_sample_ calls FLAC__stream_decoder_process_single, which calls read_frame_, which calls read_subframe_, which calls either read_subframe_fixed_ or read_subframe_lpc_, which call read_residual_partitioned_rice_. The return false set there is propagated all the way down. After few hours of a script seeking randomly in my FLAC library, I have now a reproducer for the problem. It seems to be as you say, the false value propagates through all layers and the seeking process stops. So we need to return true as a non-fatal error, but stop any further decoding on the invalid frame. I think the best fix would be to move the check for invalid predictor or partition order one layer up to the subframe decoding functions. I'll send a patch for review shortly. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH] src/libFLAC/stream_decoder.c : Rework fix for seeking bug.
To avoid crash caused by an unbound LPC decoding when predictor order is larger than blocksize, the sanity check needs to be moved to the subframe decoding functions. --- src/libFLAC/stream_decoder.c | 30 -- 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/src/libFLAC/stream_decoder.c b/src/libFLAC/stream_decoder.c index d13b23b..211b4db 100644 --- a/src/libFLAC/stream_decoder.c +++ b/src/libFLAC/stream_decoder.c @@ -1281,9 +1281,6 @@ FLAC__bool allocate_output_(FLAC__StreamDecoder *decoder, unsigned size, unsigne unsigned i; FLAC__int32 *tmp; - /* Make sure size is some sensible minimum value. Plumb through predictor_order maybe? */ - size = size FLAC__MAX_LPC_ORDER ? FLAC__MAX_LPC_ORDER : size ; - if(size = decoder-private_-output_capacity channels = decoder-private_-output_channels) return true; @@ -2594,6 +2591,11 @@ FLAC__bool read_subframe_fixed_(FLAC__StreamDecoder *decoder, unsigned channel, case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2: if(!FLAC__bitreader_read_raw_uint32(decoder-private_-input, u32, FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE_ORDER_LEN)) return false; /* read_callback_ sets the state for us */ + if(decoder-private_-frame.header.blocksize u32 order) { + send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); + decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; + return true; + } subframe-entropy_coding_method.data.partitioned_rice.order = u32; subframe-entropy_coding_method.data.partitioned_rice.contents = decoder-private_-partitioned_rice_contents[channel]; break; @@ -2673,6 +2675,11 @@ FLAC__bool read_subframe_lpc_(FLAC__StreamDecoder *decoder, unsigned channel, un case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2: if(!FLAC__bitreader_read_raw_uint32(decoder-private_-input, u32, FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE_ORDER_LEN)) return false; /* read_callback_ sets the state for us */ + if(decoder-private_-frame.header.blocksize u32 order) { + send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); + decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; + return true; + } subframe-entropy_coding_method.data.partitioned_rice.order = u32; subframe-entropy_coding_method.data.partitioned_rice.contents = decoder-private_-partitioned_rice_contents[channel]; break; @@ -2744,21 +2751,8 @@ FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder, unsigne const unsigned plen = is_extended? FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2_PARAMETER_LEN : FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE_PARAMETER_LEN; const unsigned pesc = is_extended? FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2_ESCAPE_PARAMETER : FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE_ESCAPE_PARAMETER; - /* sanity checks */ - if(partition_order == 0) { - if(decoder-private_-frame.header.blocksize predictor_order) { - send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); - decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; - return true; - } - } - else { - if(partition_samples predictor_order) { - send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); - decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; - return true; - } - } + /* invalid predictor and partition orders mush be handled in the callers */ + FLAC__ASSERT(partition_order 0? partition_samples = predictor_order : decoder-private_-frame.header.blocksize = predictor_order); if(!FLAC__format_entropy_coding_method_partitioned_rice_contents_ensure_size(partitioned_rice_contents, flac_max(6u, partition_order))) { decoder-protected_-state = FLAC__STREAM_DECODER_MEMORY_ALLOCATION_ERROR; -- 1.9.3 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Two new CVEs against FLAC
On Tue, Nov 25, 2014 at 11:40:37AM -0800, Erik de Castro Lopo wrote: Miroslav Lichvar wrote: I think the case with non-zero partition order may need to be fixed too. For example, with partition order of 1, predictor order of 16 and blocksize of 4, the function would return true and blocksize-order in the caller would still underflow. --- a/src/libFLAC/stream_decoder.c +++ b/src/libFLAC/stream_decoder.c @@ -2744,7 +2744,7 @@ FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder, unsigne if(partition_samples predictor_order) { send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; - return true; + return false; } } Thoughts? That may well be true. Is it possible to generate file that actually triggers this? Yes, I have created one by patching the frame encoder. I'll send it to you privately. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Two new CVEs against FLAC
On Wed, Nov 26, 2014 at 01:40:13AM -0800, Erik de Castro Lopo wrote: Brian Willoughby wrote: While we're on the topic, what sort of consequences are there, really, with this vulnerability? Worst case, your player stops playing on a file that cannot be played anyway. Yes, it's bad that you have to power-cycle the player to get it to restart, but it's not like you can be doing anything else at the same time you're playing a bad FLAC. Have I missed something? I think you are underestimating what a motivated cracker can do starting with a simple heap overflow. See: In this case the minimum amount of data that the attacker can write to the buffer seems to be nearly 16GB (4 * (INT_MAX - 31)), so I think most libFLAC applications will just crash. But I could very well be missing something. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Changelog: improved decoding
On Wed, Nov 26, 2014 at 06:09:38PM +0300, lvqcl wrote: There's an entry in the changelog: Improved decoding efficiency of all bit depths but especially so for 24 bits (lvqcl) A couple of comments: 1) The patch that improves encoding for all depths was proposed by Miroslav Lichvar http://git.xiph.org/?p=flac.git;a=commit;h=4eab6313cd2198b5647d925bdb3847590505fa21 2) Performance checks graph posted by Martijn van Beurden reminded me that the decoding improvements for FLAC were made only for 32-bit x64 architecture (aka IA32). FLAC is multiplatform, so it make sense to mention this. IIRC for other archs most of the decoding speed improvement was already in 1.3.0. That's where the C version of the read_rice_signed_block function was optimized. Now in 1.3.1 it's enabled also for x86, but the difference between the x86 assembly and the C function is much smaller. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Two new CVEs against FLAC
On Tue, Nov 25, 2014 at 12:29:33AM -0800, Erik de Castro Lopo wrote: Google Security Team member, Michele Spagnuolo, recently found two potential problems in the FLAC code base. They are : CVE-2014-9028 : Heap buffer write overflow https://git.xiph.org/?p=flac.git;a=commit;h=fcf0ba06ae12ccd7c67cee3c8d948df15f946b85 I'm trying to figure out how this one works. It seems the problem is integer underflow in the frame.header.blocksize-order expression used in read_subframe_fixed_() and read_subframe_lpc_() to get the number of encoded samples, which causes a buffer overflow in the LPC/fixed subframe decoding. The fix prevents that by returning false from read_residual_partitioned_rice_() to stop further decoding of the subframe when the partition order is 0 and blocksize is smaller than the predictor order. Is that correct? I think the case with non-zero partition order may need to be fixed too. For example, with partition order of 1, predictor order of 16 and blocksize of 4, the function would return true and blocksize-order in the caller would still underflow. --- a/src/libFLAC/stream_decoder.c +++ b/src/libFLAC/stream_decoder.c @@ -2744,7 +2744,7 @@ FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder, unsigne if(partition_samples predictor_order) { send_error_to_client_(decoder, FLAC__STREAM_DECODER_ERROR_STATUS_LOST_SYNC); decoder-protected_-state = FLAC__STREAM_DECODER_SEARCH_FOR_FRAME_SYNC; - return true; + return false; } } Thoughts? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] New release
On Sun, Nov 23, 2014 at 02:44:00AM -0800, Erik de Castro Lopo wrote: lvqcl wrote: I have a couple of questions: 1) Do you plan to release 1.3.1 pre1, pre2 etc or just 1.3.1 w/o any pre-releases? I had not planned to do a pre-release. FWIW, considering how much code has changed since 1.3.0, I'd rather see the security bug fixed in a new 1.3.0 release, maybe with some other serious bugs like the metaflac memory corruction, and have a prerelease for 1.3.1 to test it thoroughly. I know the new release is almost ready, but if some serious bug is found in 1.3.1, a new release will have to be made anyway to not force the users to the vulnerable version. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] 1.21 vs 1.3 encoding speed
On Sat, Jul 26, 2014 at 06:29:36PM +1000, Erik de Castro Lopo wrote: Martijn van Beurden wrote: See the following lines in the configure script if test x$user_cflags = x; then CFLAGS=-O3 -funroll-loops -Wall -W -Winline fi Maybe we should change this 'overriding' behaviour of CFLAGS and add -O3 just like most other options are added to the CFLAGS? Wow, I didn't even know that was there. I have now replaced the above with: CFLAGS=-O3 -funroll-loops -Wall -W -Winline $CFLAGS so CFLAGS are preserved and enforce optimisations. Well, this reverts the commit 18e0154. How is the user supposed to set CFLAGS without getting -O3 -funroll-loops there? (e.g. to minimize the size of the compiled binaries) -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] 1.21 vs 1.3 encoding speed
On Tue, Jul 29, 2014 at 07:47:33PM +1000, Erik de Castro Lopo wrote: Miroslav Lichvar wrote: Well, this reverts the commit 18e0154. How is the user supposed to set CFLAGS without getting -O3 -funroll-loops there? (e.g. to minimize the size of the compiled binaries) Is it -O3 or -funroll-loops that you have a problem with? Or both? What would you prefer to see as the default optimisation level? The -funroll-loops option improves the encoding speed, but it seems to increase the size significantly. I'm not suggesting to remove it from the default CFLAGS, just have an option to fully override them. I'm sure there is a solution to this. Lets find it. It would be nice if -funroll-loops was used only for the stream_encoder.c file, that's where it seems to matter most. I'm not sure if there is a better way to do this in automake, in the Fedora flac package I used this patch: http://pkgs.fedoraproject.org/cgit/flac.git/plain/flac-1.2.1-cflags.patch?id=66a59af0bdc5ae4a719aac5d4a8c41817906a01f -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Performance checks
On Wed, Jul 02, 2014 at 10:18:58PM +0200, Martijn van Beurden wrote: http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Linux-GCC-4.8.pdf http://www.audiograaf.nl/misc_stuff/FLAC-performance-test-Wine-MSVC-2013.pdf For the GCC 4.8 results, there is a*very nice 60% to 70% speed increase* when encoding with preset -8 between FLAC 1.3.0 and current git. That's indeed a significant improvement. I'm assuming most of it is from the SSE intrinsic code. Good job, lvqcl! -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] two patches of doubtful usefulness
On Thu, Jul 03, 2014 at 04:21:59PM +0400, lvqcl wrote: Erik de Castro Lopo wrote: There's the following code in stream_decoder.c: Like you, I don't see a lot of value in these. I think I'll decline these. FLAC__lpc_restore_signal_asm_ia32_mmx compares 'order' argument with 4 and if it's greater then it jumps to FLAC__lpc_restore_signal_asm_ia32. I wonder why the same wasn't done for PPC/Altivec: why libFLAC compares 'order' and 8 in C code and not in asm. Perhaps because it's easier to do it in C than in asm? :) Wrapping the check in #ifdef will save few instructions on non-ppc archs, but in this case I think it doesn't really matter. It would make the code less readable and more likely someone will forgot about it in the future and break it for ppc. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Mon, Jun 30, 2014 at 08:08:42PM +1000, Erik de Castro Lopo wrote: Can someone please provide me with a copy of the file snippet6.wav ? I'd like to test this some more. I got it from http://wootangent.net/~lsd/blah/snippet6.wav -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Thu, Jun 19, 2014 at 09:23:26PM +0200, Martijn van Beurden wrote: op 19-06-14 20:17, lvqcl schreef: It seems quite common for 16-bit files: I second that. Apparently it depends on the kind of music, some albums have no warnings (mostly classical music it seems), some have over 20 per file. I've seen a few with 100 per file (things like rock and heavy metal). Most of the FLAC files were encoded with FLAC 1.2.1. Thanks for the data. I think I was just confused. The residual seems to be always handled as 32-bit and it's only added to the original signal, there is no multiplication, so it shouldn't be a problem. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Lets work towards a new version
On Thu, Jun 19, 2014 at 06:52:05PM +0200, Martijn van Beurden wrote: One other issue that I've brought up before but hasn't been fixed yet: none of the people that have contributed to FLAC in the last few years (save Miroslav) are in the AUTHORS file :) Also, it would be nice to mention in the file that Erik is the current maintainer. :) -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Fri, Jun 20, 2014 at 01:21:03PM +0400, lvqcl wrote: Miroslav Lichvar цкщеу: +/* + * This is used to avoid overflow with unusual signals in 32-bit + * accumulator in the *precompute_partition_info_sums_* functions. + */ +#define FLAC__MAX_EXTRA_RESIDUAL_BPS 4 + /* WATCHOUT: + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS is the maximum +* assumed size of the average residual magnitude */ + if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS 32) { From FLAC__fixed_compute_residual: residual[i] = data[i] - 4*data[i-1] + 6*data[i-2] - 4*data[i-3] + data[i-4]; so max(residual[i]) == 16 * max(data[j]), or: max_bps(residual[]) == 4 + max_bps(data[]). Am I right that it's the reason why FLAC__MAX_EXTRA_RESIDUAL_BPS is equal to 4? Not really, it's just a guess. With LPC the maximum possible residual could be much larger than with the fixed predictor if the coefficients were chosen randomly, but the autocorrelation routine should keep them more reasonable. The snippet6.wav file needed 2, so I made it slightly larger to have some extra room. As overflow in the accumulator won't result in a data loss, I think this is good enough until someone can figure out a better approach. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Fri, Jun 20, 2014 at 02:15:31PM +0400, lvqcl wrote: Miroslav Lichvar wrote: As overflow in the accumulator won't result in a data loss, I think this is good enough until someone can figure out a better approach. FLAC calculates real bitdepth of input signal for every block. Is it feasible to calculate real bitdepth of residual signal and use this value instead of (bps + 4) ? That would be expensive and I'm not sure if it's worth the cost. It could be a separate function or it could be integrated in the fixed and LPC encoding functions. The later would be probably faster, but it would require patching the assembly code. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Conflict in float_t type
On Thu, Jun 19, 2014 at 07:28:11PM +1000, Erik de Castro Lopo wrote: Miroslav Lichvar wrote: In file included from /usr/include/math.h:45:0, from grabbag/replaygain.c:25: /usr/include/bits/mathdef.h:35:21: note: previous declaration of 'float_t' was here typedef long double float_t; /* `float' expressions are evaluated as What distro? What package and what version of the what package installed that file? Fedora rawhide, glibc-headers-2.19.90-21.fc21.1.i686. Looking at the file in an older Fedora, it doesn't seem to be something new. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
In the precompute_partition_info_sums_ function, instead of selecting 64-bit accumulator when the signal bps is larger than 16, revert to the original approach based on partition size, but make room for few extra bits to not overflow with unusual signals where the average residual magnitude may be larger than bps. It slightly improves the performance with standard encoding levels and 16-bit files as the 17-bit side channel can still be processed with the 32-bit accumulator and correctly selects the 64-bit accumulator with very large 16-bit partitions. This is related to commits 6f7ec60c and 187e596e. --- src/libFLAC/include/private/stream_encoder.h | 6 ++ src/libFLAC/stream_encoder.c | 14 ++ src/libFLAC/stream_encoder_intrin_sse2.c | 3 ++- src/libFLAC/stream_encoder_intrin_ssse3.c| 3 ++- 4 files changed, 16 insertions(+), 10 deletions(-) diff --git a/src/libFLAC/include/private/stream_encoder.h b/src/libFLAC/include/private/stream_encoder.h index d26039a..8147f9e 100644 --- a/src/libFLAC/include/private/stream_encoder.h +++ b/src/libFLAC/include/private/stream_encoder.h @@ -37,6 +37,12 @@ #include config.h #endif +/* + * This is used to avoid overflow with unusual signals in 32-bit + * accumulator in the *precompute_partition_info_sums_* functions. + */ +#define FLAC__MAX_EXTRA_RESIDUAL_BPS 4 + #if (defined FLAC__CPU_IA32 || defined FLAC__CPU_X86_64) defined FLAC__HAS_X86INTRIN #include private/cpu.h #include FLAC/format.h diff --git a/src/libFLAC/stream_encoder.c b/src/libFLAC/stream_encoder.c index e64ece2..8928a39 100644 --- a/src/libFLAC/stream_encoder.c +++ b/src/libFLAC/stream_encoder.c @@ -3872,10 +3872,9 @@ void precompute_partition_info_sums_( FLAC__ASSERT(default_partition_samples predictor_order); #if defined(FLAC__CPU_IA32) !defined FLAC__NO_ASM defined FLAC__HAS_NASM 0 - /* WATCHOUT: + bps is an assumption that the average residual magnitude will not be more than bps bits */ - /* previously the condition was: if(FLAC__bitmath_ilog2(default_partition_samples) + bps 32) */ - /* see http://git.xiph.org/?p=flac.git;a=commit;h=6f7ec60c7e7f05f5ab0b1cf6b7b0945e44afcd4b */ - if(bps = 16) { + /* WATCHOUT: + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS is the maximum +* assumed size of the average residual magnitude */ + if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS 32) { FLAC__precompute_partition_info_sums_32bit_asm_ia32_(residual, abs_residual_partition_sums, residual_samples + predictor_order, predictor_order, min_partition_order, max_partition_order); return; } @@ -3884,10 +3883,9 @@ void precompute_partition_info_sums_( /* first do max_partition_order */ { unsigned partition, residual_sample, end = (unsigned)(-(int)predictor_order); - /* WATCHOUT: + bps is an assumption that the average residual magnitude will not be more than bps bits */ - /* previously the condition was: if(FLAC__bitmath_ilog2(default_partition_samples) + bps 32) */ - /* see http://git.xiph.org/?p=flac.git;a=commit;h=6f7ec60c7e7f05f5ab0b1cf6b7b0945e44afcd4b */ - if(bps = 16) { + /* WATCHOUT: + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS is the maximum +* assumed size of the average residual magnitude */ + if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS 32) { FLAC__uint32 abs_residual_partition_sum; for(partition = residual_sample = 0; partition partitions; partition++) { diff --git a/src/libFLAC/stream_encoder_intrin_sse2.c b/src/libFLAC/stream_encoder_intrin_sse2.c index bef5545..4e9d5db 100644 --- a/src/libFLAC/stream_encoder_intrin_sse2.c +++ b/src/libFLAC/stream_encoder_intrin_sse2.c @@ -37,6 +37,7 @@ #ifndef FLAC__NO_ASM #if (defined FLAC__CPU_IA32 || defined FLAC__CPU_X86_64) defined FLAC__HAS_X86INTRIN #include private/stream_encoder.h +#include private/bitmath.h #ifdef FLAC__SSE2_SUPPORTED #include stdlib.h/* for abs() */ @@ -58,7 +59,7 @@ void FLAC__precompute_partition_info_sums_intrin_sse2(const FLAC__int32 residual unsigned e1, e3; __m128i mm_res, mm_sum, mm_mask; - if(bps = 16) { + if(FLAC__bitmath_ilog2(default_partition_samples) + bps + FLAC__MAX_EXTRA_RESIDUAL_BPS 32) { for(partition = residual_sample = 0; partition partitions; partition++) { end += default_partition_samples; mm_sum = _mm_setzero_si128(); diff --git a/src/libFLAC/stream_encoder_intrin_ssse3.c b/src/libFLAC/stream_encoder_intrin_ssse3.c index 95b5f62..669536a 100644 --- a/src/libFLAC/stream_encoder_intrin_ssse3.c +++ b/src/libFLAC/stream_encoder_intrin_ssse3.c @@
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Thu, Jun 19, 2014 at 03:30:22PM +0400, lvqcl wrote: BTW, what can you say about the following place in stream_decoder.c in read_subframe_lpc_() function: /*@@ technically not pessimistic enough, should be more like if( (FLAC__uint64)order * FLAC__uint64)1)bps)-1) * ((1subframe-qlp_coeff_precision)-1) (((FLAC__uint64)-1) 32) ) */ if(bps + subframe-qlp_coeff_precision + FLAC__bitmath_ilog2(order) = 32) Is it really not pessimistic enough? Can it be changed similarly to your patch for stream_encoder.c? Good question. I'm not sure what exactly Josh meant by that comment. The git message says just minor comments. I think the right size of the expression was meant to be (FLAC__uint64)132, otherwise it doesn't make much sense to me. With that it can rewritten in log2 as bps + subframe-qlp_coeff_precision + FLAC__bitmath_ilog2(order - 1) 32 This is indeed more pessimistic that the currently used check ( 32 vs = 32), but I think both make a mistake that the qlp coefficients and residuals are unsigned integers and are actually more pessimistic than they would need to be if residuals were at most bps wide. With signed multiplication I think the correct check would actually be bps + subframe-qlp_coeff_precision + FLAC__bitmath_ilog2(order) - 1 = 32 But, as we have seen with unusual data the residual signal can be wider than bps. The FLAC format specification doesn't seem to mention this. Should it be treated as a valid FLAC stream? Based on the analysis above, the currently used check allows residuals at most 1 bit wider than bps. Another problem could be that the local_lpc_restore_signal_16bit function may truncate the residual to 16 bits. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Thu, Jun 19, 2014 at 03:30:06PM +0200, Miroslav Lichvar wrote: But, as we have seen with unusual data the residual signal can be wider than bps. The FLAC format specification doesn't seem to mention this. Should it be treated as a valid FLAC stream? I think it would be interesting to know how common are such streams. I patched flac to print a warning on decoding or testing when this is detected, but didn't find any files with this problem in my (small) music collection. If someone has a large collection and some cycles to spare, can you please consider compiling flac from git with the attached patch and see if you have any files that fail with flac -t ? With the known problem file (snippet6.wav) encoded by 1.3.0 it prints this: WARNING: residual -11025151 wider than bps 24 WARNING: residual 41873263 wider than bps 24 WARNING: residual -67175215 wider than bps 24 WARNING: residual 69950995 wider than bps 24 WARNING: residual -67108864 wider than bps 24 ... WARNING: residual 11227392 wider than bps 24 WARNING: residual -8754288 wider than bps 24 snippet6.flac: ERROR, MD5 signature mismatch Thanks, -- Miroslav Lichvar diff --git a/src/libFLAC/stream_decoder.c b/src/libFLAC/stream_decoder.c index ddd8979..82318ae 100644 --- a/src/libFLAC/stream_decoder.c +++ b/src/libFLAC/stream_decoder.c @@ -99,7 +99,7 @@ static FLAC__bool read_subframe_constant_(FLAC__StreamDecoder *decoder, unsigned static FLAC__bool read_subframe_fixed_(FLAC__StreamDecoder *decoder, unsigned channel, unsigned bps, const unsigned order, FLAC__bool do_full_decode); static FLAC__bool read_subframe_lpc_(FLAC__StreamDecoder *decoder, unsigned channel, unsigned bps, const unsigned order, FLAC__bool do_full_decode); static FLAC__bool read_subframe_verbatim_(FLAC__StreamDecoder *decoder, unsigned channel, unsigned bps, FLAC__bool do_full_decode); -static FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder, unsigned predictor_order, unsigned partition_order, FLAC__EntropyCodingMethod_PartitionedRiceContents *partitioned_rice_contents, FLAC__int32 *residual, FLAC__bool is_extended); +static FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder, unsigned predictor_order, unsigned partition_order, FLAC__EntropyCodingMethod_PartitionedRiceContents *partitioned_rice_contents, FLAC__int32 *residual, FLAC__bool is_extended, int bps); static FLAC__bool read_zero_padding_(FLAC__StreamDecoder *decoder); static FLAC__bool read_callback_(FLAC__byte buffer[], size_t *bytes, void *client_data); #if FLAC__HAS_OGG @@ -2572,7 +2572,7 @@ FLAC__bool read_subframe_fixed_(FLAC__StreamDecoder *decoder, unsigned channel, switch(subframe-entropy_coding_method.type) { case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE: case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2: - if(!read_residual_partitioned_rice_(decoder, order, subframe-entropy_coding_method.data.partitioned_rice.order, decoder-private_-partitioned_rice_contents[channel], decoder-private_-residual[channel], /*is_extended=*/subframe-entropy_coding_method.type == FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2)) + if(!read_residual_partitioned_rice_(decoder, order, subframe-entropy_coding_method.data.partitioned_rice.order, decoder-private_-partitioned_rice_contents[channel], decoder-private_-residual[channel], /*is_extended=*/subframe-entropy_coding_method.type == FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2, bps)) return false; break; default: @@ -2651,7 +2651,7 @@ FLAC__bool read_subframe_lpc_(FLAC__StreamDecoder *decoder, unsigned channel, un switch(subframe-entropy_coding_method.type) { case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE: case FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2: - if(!read_residual_partitioned_rice_(decoder, order, subframe-entropy_coding_method.data.partitioned_rice.order, decoder-private_-partitioned_rice_contents[channel], decoder-private_-residual[channel], /*is_extended=*/subframe-entropy_coding_method.type == FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2)) + if(!read_residual_partitioned_rice_(decoder, order, subframe-entropy_coding_method.data.partitioned_rice.order, decoder-private_-partitioned_rice_contents[channel], decoder-private_-residual[channel], /*is_extended=*/subframe-entropy_coding_method.type == FLAC__ENTROPY_CODING_METHOD_PARTITIONED_RICE2, bps)) return false; break; default: @@ -2703,7 +2703,7 @@ FLAC__bool read_subframe_verbatim_(FLAC__StreamDecoder *decoder, unsigned channe return true; } -FLAC__bool read_residual_partitioned_rice_(FLAC__StreamDecoder *decoder
Re: [flac-dev] [PATCH] stream_encoder : Improve selection of residual accumulator width
On Thu, Jun 19, 2014 at 05:04:23PM +0200, Miroslav Lichvar wrote: Yes, it's the same check. Assuming residual can be at most FLAC__MAX_EXTRA_RESIDUAL_BPS bits wider than subframe_bps, I think it should be: if(subframe_bps + qlp_coeff_precision + FLAC__bitmath_ilog2(order) + FLAC__MAX_EXTRA_RESIDUAL_BPS - 1 = 32) if(subframe_bps + FLAC__MAX_EXTRA_RESIDUAL_BPS = 16 qlp_coeff_precision = 16) On second thought, this is probably not necessary. The residual seems to be always treated as 32-bit in the LPC encoding and decoding functions, even in the MMX assembly if I'm reading it correctly. It doesn't matter if it's wider than the original signal. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] exhaustive-model-search issue results in multi-gigabyte FLAC file
On Wed, Jul 17, 2013 at 07:45:53PM +1000, Erik de Castro Lopo wrote: The fix was changing one local variable from FLAC_uint32 to FLAC_uint64 in function precompute_partition_info_sums_(). https://git.xiph.org/?p=flac.git;a=commit;h=6f7ec60c7e7f05f5ab0b1cf6b7b0945e44afcd4b I don't like this fix. It will probably hurt performance with 16-bit data and it won't fix the problem in the assembly code. I think the check if 32-bit accumulator is enough should be improved instead if possible. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH] Disable FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap.
Don't use the assembly function since it seems to be slower than the current version of FLAC__bitreader_read_rice_signed_block. --- src/libFLAC/stream_decoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/libFLAC/stream_decoder.c b/src/libFLAC/stream_decoder.c index f987c27..37934de 100644 --- a/src/libFLAC/stream_decoder.c +++ b/src/libFLAC/stream_decoder.c @@ -400,7 +400,7 @@ static FLAC__StreamDecoderInitStatus init_stream_internal_( #ifdef FLAC__CPU_IA32 FLAC__ASSERT(decoder-private_-cpuinfo.type == FLAC__CPUINFO_TYPE_IA32); #ifdef FLAC__HAS_NASM -#if 1 /*@@ OPT: not clearly faster, needs more testing */ +#if 0 /*@@ OPT: seems to be slower than FLAC__bitreader_read_rice_signed_block */ if(decoder-private_-cpuinfo.data.ia32.bswap) decoder-private_-local_bitreader_read_rice_signed_block = FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap; #endif -- 1.8.1.4 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Performance checks
On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote: I was surprised to see that the Windows compile on wine actually outperformed the native Linux one. Probably GCC 4.6 optimized a little better or something very weird is going on in wine, I don't know. The assembly optimizations work very well on encoding, but actually slow things down when decoding. The difference is not very large however. In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small improvement in decoding speed with assembly optimizations turned on, but I think the difference used to be larger. Perhaps the compilers got better or MMX is slower relative to normal code on current CPUs. Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap function seems to help a bit. (there is an #if disabling the function with comment OPT: not clearly faster, needs more testing in the src/libFLAC/stream_decoder.c file) Here is the relative decoding speed with -5 and -8: -5 -8 no asm 99.0% 97.0% asm 100.0% 100.0% asm (no ia32_bswap) 102.7% 102.7% I think we should drop that assembly function as the C version seems to be faster now. Can anyone confirm this? Thanks, -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Performance checks
On Tue, May 28, 2013 at 07:17:58PM +0200, Martijn van Beurden wrote: I was doing some checks in preparation of updating the comparison on the FLAC page this summer and I thought the results might be interesting for people on the dev list as well. I'm always interested in performance tests :). The performance of the minGW-w64 build (through wine) and the native Linux 64-bit build is similar, so I guess my original question is answered: wine doesn't affect performance. However, I tried quite a few things building a 32-bit binary on my Linux system, but they are all *very* slow. Does anyone know why? I ran ./configure --build=i686-pc-linux-gnu CFLAGS='-m32' CPPFLAGS='-m32' LDFLAGS='-m32' and tried a bunch of other things. Any thoughts? I think if you are setting CFLAGS you need to include also the optimizations flags, e.g. -m32 -O3 -funroll-loops to match the default CFLAGS. Thanks, -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH] Fix option names in documentation and help messages.
--- doc/html/documentation_tools_flac.html | 8 +--- doc/html/ru/documentation.html | 2 +- man/flac.1 | 6 -- man/flac.sgml | 5 +++-- src/flac/main.c| 4 +++- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/doc/html/documentation_tools_flac.html b/doc/html/documentation_tools_flac.html index 759081c..5f7d67f 100644 --- a/doc/html/documentation_tools_flac.html +++ b/doc/html/documentation_tools_flac.html @@ -963,7 +963,7 @@ span class=argument--input-size=#/span /td td - Specify the size of the raw input in bytes. If you are encoding raw samples from stdin, you must set this option in order to be able to use --skip, --until, --cue-sheet, or other options that need to know the size of the input beforehand. If the size given is greater than what is found in the input stream, the encoder will complain about an unexpected end-of-file. If the size given is less, samples will be truncated. + Specify the size of the raw input in bytes. If you are encoding raw samples from stdin, you must set this option in order to be able to use --skip, --until, --cuesheet, or other options that need to know the size of the input beforehand. If the size given is greater than what is found in the input stream, the encoder will complain about an unexpected end-of-file. If the size given is less, samples will be truncated. /td /tr tr @@ -1017,6 +1017,7 @@ tr td nowrap=nowrap align=right valign=top bgcolor=#F4F4CC span class=argument--no-adaptive-mid-side/spanbr / + span class=argument--no-cued-seekpoints/spanbr / span class=argument--no-decode-through-errors/spanbr / span class=argument--no-delete-input-file/spanbr / span class=argument--no-escape-coding/spanbr / @@ -1026,7 +1027,7 @@ span class=argument--no-mid-side/spanbr / span class=argument--no-ogg/spanbr / span class=argument--no-padding/spanbr / - span class=argument--no-qlp-coeff-precision-search/spanbr / + span class=argument--no-qlp-coeff-prec-search/spanbr / span class=argument--no-residual-gnuplot/spanbr / span class=argument--no-residual-text/spanbr / span class=argument--no-sector-align/spanbr / @@ -1108,6 +1109,7 @@ a href=#flac_options_max_lpc_orderspan class=argument--max-lpc-order/span/abr / a href=#flac_options_mid_sidespan class=argument--mid-side/span/abr / a href=#negative_optionsspan class=argument--no-adaptive-mid-side/span/abr / + a href=#negative_optionsspan class=argument--no-cued-seekpoints/span/abr / a href=#negative_optionsspan class=argument--no-decode-through-errors/span/abr / a href=#negative_optionsspan class=argument--no-delete-input-file/span/abr / a href=#negative_optionsspan class=argument--no-escape-coding/span/abr / @@ -1118,7 +1120,7 @@ a href=#negative_optionsspan class=argument--no-ogg/span/abr / a href=#negative_optionsspan class=argument--no-padding/span/abr / a href=#negative_optionsspan class=argument--no-preserve-modtime/span/abr / - a href=#negative_optionsspan class=argument--no-qlp-coeff-precision-search/span/abr / + a href=#negative_optionsspan class=argument--no-qlp-coeff-prec-search/span/abr / a href=#negative_optionsspan class=argument--no-residual-gnuplot/span/abr / a href=#negative_optionsspan class=argument--no-residual-text/span/abr / a href=#negative_optionsspan class=argument--no-sector-align/span/abr / diff --git a/doc/html/ru/documentation.html b/doc/html/ru/documentation.html index 3497c2b..1c056af 100644 --- a/doc/html/ru/documentation.html +++ b/doc/html/ru/documentation.html @@ -437,7 +437,7 @@ TT--no-mid-side/TTBR TT--no-ogg/TTBR TT--no-padding/TTBR -TT--no-qlp-coeff-precision-search/TTBR +TT--no-qlp-coeff-prec-search/TTBR TT--no-residual-gnuplot/TTBR TT--no-residual-text/TTBR TT--no-sector-align/TTBR diff --git a/man/flac.1 b/man/flac.1 index bd112af..411963b 100644 --- a/man/flac.1 +++ b/man/flac.1 @@ -283,7
Re: [flac-dev] flac 1.3.0pre2 pre-release
On Sat, Mar 09, 2013 at 12:28:28PM -0500, Ben Allison wrote: Erik, Sorry for the confusion. There is one more patch. I had it in the first attempt I made but somehow these changes weren't in the redone patch. As mentioned before, this removes some of the 'inline' from the bitreader and bitwriter functions that were used in another translation unit. I'm surprised that this code works on other platform. It must be a bug in GCC, or maybe deliberately non-standard behavior. See 6.7.4 of the C99 spec for details. I don't see the problem. What exactly is the compiler error? It seems the declarations in the header files don't have inline and they are included with the definitions, so they shouldn't be inline definitions and should be callable from other units. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] flac 1.3.0pre2 pre-release
On Mon, Mar 11, 2013 at 08:30:18AM -0400, Ben Allison wrote: Take, for example, the function FLAC__bitreader_is_consumed_byte_aligned. It is prototyped in bitreader.h It is used in stream_decoder.c, so it must be defined and made available to the linker (external definition). However, the only definition in bitreader.c has been declared inline. From 6.7.4.6 An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. My understanding is that it's not an inline definition as there is a non-inline declaration in bitreader.h. If there are no better solutions to fix the problem, I'd suggest to #define inline to nothing for MSVC. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] flac 1.3.0pre2 pre-release
On Mon, Mar 11, 2013 at 08:51:43AM -0400, Ben Allison wrote: From 6.7.4.6 An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. My understanding is that it's not an inline definition as there is a non-inline declaration in bitreader.h. I don't see how that can be the case. The spec explicitly states that inline definitions do not provide an external definition. But it's not an inline definition according to 6.7.4.6 (the sentence before the part you have quoted): If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] Website comparison + fix IE
On Sun, Jan 06, 2013 at 06:12:41PM +0100, Martijn van Beurden wrote: Hi all, The past few weeks I've been busy comparing lossless audio codecs to update the comparison.html page on the FLAC website (and because I wrote a comparison for Hydrogenaudio in the past) and its ready now. Because the patch is pretty large, I've placed it here: http://www.icer.nl/misc_stuff/update-comparison-and-fix-IE-news.patch.zip Very nice. I'd like to see the comparison page updated too. I've just few questions on the pdf. Was the comparison done on a 32 or 64-bit system? Doesn't using wine negatively affect the performance? Wouldn't it be better to use a logarithmic scale in the graphs for encoding/decoding times and speeds? Thanks, -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [PATCH] Makefile.lite: Fix building with MSYS and MinGW(-w64), Improvements
On Thu, Dec 27, 2012 at 01:08:04PM -0600, J. Hendricks wrote: 4. The compiler complained when lround() in lpc.c was static, so it is no longer static. This seems strange. Was there any explanation why is it needed? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] /usr/include/FLAC no longer searched for headers
I've received a bugreport that vlc doesn't compile with the current flac. The problem seems to be that it includes stream_decoder.h instead of FLAC/stream_decoder.h. This no longer works due to the commit b76d4f (it was discussed on this list). I'm not sure how many clients are relying on the directory to be searched by default, but I think it might be worth mentioning in the changelog. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 4/5] Hide symbols with nasm.
Hide all cglobal symbols with nasm = 2. --- src/libFLAC/ia32/nasm.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/libFLAC/ia32/nasm.h b/src/libFLAC/ia32/nasm.h index 8455772..e58b744 100644 --- a/src/libFLAC/ia32/nasm.h +++ b/src/libFLAC/ia32/nasm.h @@ -57,7 +57,11 @@ %ifdef FLAC__PUBLIC_NEEDS_UNDERSCORE global _%1 %else - global %1 + %if __NASM_MAJOR__ = 2 + global %1:function hidden + %else + global %1 + %endif %endif %endmacro -- 1.7.11.7 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 3/5] Hide symbols with gcc.
With gcc = 4 and ELF, set default visibility to hidden and make visible only the symbols with FLAC_API or FLACPP_API. A convenience libFLAC-static.la is created for test_libFLAC as it depends on the hidden symbols. --- configure.ac | 8 +++- include/FLAC++/export.h | 13 + include/FLAC/export.h| 13 + src/libFLAC/Makefile.am | 10 +- src/test_libFLAC/Makefile.am | 2 +- 5 files changed, 35 insertions(+), 11 deletions(-) diff --git a/configure.ac b/configure.ac index f4a31e4..3899d68 100644 --- a/configure.ac +++ b/configure.ac @@ -360,9 +360,15 @@ if test x$ac_cv_c_compiler_gnu = xyes ; then if test x$enable_gcc_werror = xyes ; then CFLAGS=-Wall -Wextra -Werror $CFLAGS CXXFLAGS=-Wall -Wextra -Werror $CXXFLAGS - fi fi + if test $GCC_MAJOR_VERSION -ge 4 test $OBJ_FORMAT = elf; then + CPPFLAGS=$CPPFLAGS -DFLAC__USE_VISIBILITY_ATTR + CFLAGS=$CFLAGS -fvisibility=hidden + CXXFLAGS=$CXXFLAGS -fvisibility=hidden + fi +fi + #@@@ AM_CONDITIONAL(FLaC__HAS_AS__TEMPORARILY_DISABLED, test yes = no) diff --git a/include/FLAC++/export.h b/include/FLAC++/export.h index e3bc51f..d3bd136 100644 --- a/include/FLAC++/export.h +++ b/include/FLAC++/export.h @@ -55,17 +55,22 @@ * \{ */ -#if defined(FLAC__NO_DLL) || !defined(_MSC_VER) +#if defined(FLAC__NO_DLL) #define FLACPP_API -#else - +#elif defined(_MSC_VER) #ifdef FLACPP_API_EXPORTS #defineFLACPP_API _declspec(dllexport) #else #define FLACPP_API _declspec(dllimport) - #endif + +#elif defined(FLAC__USE_VISIBILITY_ATTR) +#define FLACPP_API __attribute__ ((visibility (default))) + +#else +#define FLACPP_API + #endif /* These #defines will mirror the libtool-based library version number, see diff --git a/include/FLAC/export.h b/include/FLAC/export.h index d239b9b..312746d 100644 --- a/include/FLAC/export.h +++ b/include/FLAC/export.h @@ -55,17 +55,22 @@ * \{ */ -#if defined(FLAC__NO_DLL) || !defined(_MSC_VER) +#if defined(FLAC__NO_DLL) #define FLAC_API -#else - +#elif defined(_MSC_VER) #ifdef FLAC_API_EXPORTS #defineFLAC_API_declspec(dllexport) #else #define FLAC_API _declspec(dllimport) - #endif + +#elif defined(FLAC__USE_VISIBILITY_ATTR) +#define FLAC_API __attribute__ ((visibility (default))) + +#else +#define FLAC_API + #endif /** These #defines will mirror the libtool-based library version number, see diff --git a/src/libFLAC/Makefile.am b/src/libFLAC/Makefile.am index b20e21a..a63a3d0 100644 --- a/src/libFLAC/Makefile.am +++ b/src/libFLAC/Makefile.am @@ -30,6 +30,7 @@ AM_CPPFLAGS = -I$(top_builddir) -I$(srcdir)/include -I$(top_srcdir)/include lib_LTLIBRARIES = libFLAC.la +noinst_LTLIBRARIES = libFLAC-static.la if DEBUG DEBUGCFLAGS = -DFLAC__OVERFLOW_DETECT endif @@ -106,7 +107,8 @@ extra_ogg_sources = \ endif # see 'http://www.gnu.org/software/libtool/manual/libtool.html#Libtool-versioning' for numbering convention libFLAC_la_LDFLAGS = -version-info 10:0:2 $(LOCAL_EXTRA_LDFLAGS) @LT_NO_UNDEFINED@ -libFLAC_la_SOURCES = \ + +libFLAC_sources = \ bitmath.c \ bitreader.c \ bitwriter.c \ @@ -125,3 +127,9 @@ libFLAC_la_SOURCES = \ stream_encoder_framing.c \ window.c \ $(extra_ogg_sources) + +libFLAC_la_SOURCES = $(libFLAC_sources) + +# needed for test_libFLAC +libFLAC_static_la_LIBADD = $(LOCAL_EXTRA_LIBADD) +libFLAC_static_la_SOURCES = $(libFLAC_sources) diff --git a/src/test_libFLAC/Makefile.am b/src/test_libFLAC/Makefile.am index 91a57fb..ed9aee8 100644 --- a/src/test_libFLAC/Makefile.am +++ b/src/test_libFLAC/Makefile.am @@ -27,7 +27,7 @@ test_libFLAC_LDADD = \ $(top_builddir)/src/share/grabbag/libgrabbag.la \ $(top_builddir)/src/share/replaygain_analysis/libreplaygain_analysis.la \ $(top_builddir)/src/test_libs_common/libtest_libs_common.la \ - $(top_builddir)/src/libFLAC/libFLAC.la \ + $(top_builddir)/src/libFLAC/libFLAC-static.la \ @OGG_LIBS@ \ -lm -- 1.7.11.7 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 2/5] Don't override user-specified CFLAGS.
--- configure.ac | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 2db9035..f4a31e4 100644 --- a/configure.ac +++ b/configure.ac @@ -25,6 +25,8 @@ AC_CONFIG_MACRO_DIR([m4]) AM_INIT_AUTOMAKE([foreign -Wall tar-pax no-dist-gzip dist-xz subdir-objects]) m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) +user_cflags=$CFLAGS + #Prefer whatever the current ISO standard is. AC_PROG_CC_STDC AC_USE_SYSTEM_EXTENSIONS @@ -346,7 +348,9 @@ if test x$debug = xtrue; then else CPPFLAGS=-DNDEBUG $CPPFLAGS if test x$GCC = xyes; then -CFLAGS=-O3 -funroll-loops -Wall -W -Winline $CFLAGS + if test x$user_cflags = x; then + CFLAGS=-O3 -funroll-loops -Wall -W -Winline + fi fi fi -- 1.7.11.7 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 1/5] Remove old GNU-stack sections from nasm files.
They are not needed since the section is defined in nasm.h. --- src/libFLAC/ia32/bitreader_asm.nasm | 4 src/libFLAC/ia32/cpu_asm.nasm| 4 src/libFLAC/ia32/fixed_asm.nasm | 4 src/libFLAC/ia32/lpc_asm.nasm| 4 src/libFLAC/ia32/stream_encoder_asm.nasm | 4 5 files changed, 20 deletions(-) diff --git a/src/libFLAC/ia32/bitreader_asm.nasm b/src/libFLAC/ia32/bitreader_asm.nasm index 4cd0ea2..b0f5ed6 100644 --- a/src/libFLAC/ia32/bitreader_asm.nasm +++ b/src/libFLAC/ia32/bitreader_asm.nasm @@ -590,7 +590,3 @@ cident FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap ret end - -%ifdef OBJ_FORMAT_elf - section .note.GNU-stack noalloc -%endif diff --git a/src/libFLAC/ia32/cpu_asm.nasm b/src/libFLAC/ia32/cpu_asm.nasm index a3a3b76..05a4e6f 100644 --- a/src/libFLAC/ia32/cpu_asm.nasm +++ b/src/libFLAC/ia32/cpu_asm.nasm @@ -115,7 +115,3 @@ cident FLAC__cpu_info_extended_amd_asm_ia32 ret end - -%ifdef OBJ_FORMAT_elf - section .note.GNU-stack noalloc -%endif diff --git a/src/libFLAC/ia32/fixed_asm.nasm b/src/libFLAC/ia32/fixed_asm.nasm index 0d4fe1a..d04e036 100644 --- a/src/libFLAC/ia32/fixed_asm.nasm +++ b/src/libFLAC/ia32/fixed_asm.nasm @@ -306,7 +306,3 @@ cident FLAC__fixed_compute_best_predictor_asm_ia32_mmx_cmov ret end - -%ifdef OBJ_FORMAT_elf - section .note.GNU-stack noalloc -%endif diff --git a/src/libFLAC/ia32/lpc_asm.nasm b/src/libFLAC/ia32/lpc_asm.nasm index fe4ae88..fadd80a 100644 --- a/src/libFLAC/ia32/lpc_asm.nasm +++ b/src/libFLAC/ia32/lpc_asm.nasm @@ -1505,7 +1505,3 @@ cident FLAC__lpc_restore_signal_asm_ia32_mmx ret end - -%ifdef OBJ_FORMAT_elf - section .note.GNU-stack noalloc -%endif diff --git a/src/libFLAC/ia32/stream_encoder_asm.nasm b/src/libFLAC/ia32/stream_encoder_asm.nasm index fef15d8..987345b 100644 --- a/src/libFLAC/ia32/stream_encoder_asm.nasm +++ b/src/libFLAC/ia32/stream_encoder_asm.nasm @@ -153,7 +153,3 @@ cident precompute_partition_info_sums_32bit_asm_ia32_ ret end - -%ifdef OBJ_FORMAT_elf - section .note.GNU-stack noalloc -%endif -- 1.7.11.7 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCHv2] Add missing options to flac man page.
--- man/flac.1| 4 man/flac.sgml | 2 ++ 2 files changed, 6 insertions(+) diff --git a/man/flac.1 b/man/flac.1 index eaeb028..bd112af 100644 --- a/man/flac.1 +++ b/man/flac.1 @@ -310,6 +310,8 @@ Force the decoder to output Wave64 format. This option is not needed if the out .TP \fB--no-exhaustive-model-search\fR .TP +\fB--no-force\fR +.TP \fB--no-lax\fR .TP \fB--no-mid-side\fR @@ -320,6 +322,8 @@ Force the decoder to output Wave64 format. This option is not needed if the out .TP \fB--no-qlp-coeff-precision-search\fR .TP +\fB--no-replay-gain\fR +.TP \fB--no-residual-gnuplot\fR .TP \fB--no-residual-text\fR diff --git a/man/flac.sgml b/man/flac.sgml index 3b96644..587e377 100644 --- a/man/flac.sgml +++ b/man/flac.sgml @@ -698,11 +698,13 @@ termoption--no-preserve-modtime/option/term termoption--no-keep-foreign-metadata/option/term termoption--no-exhaustive-model-search/option/term + termoption--no-force/option/term termoption--no-lax/option/term termoption--no-mid-side/option/term termoption--no-ogg/option/term termoption--no-padding/option/term termoption--no-qlp-coeff-precision-search/option/term + termoption--no-replay-gain/option/term termoption--no-residual-gnuplot/option/term termoption--no-residual-text/option/term termoption--no-sector-align/option/term -- 1.7.11.7 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] Status of flac; new release?
On Fri, Nov 30, 2012 at 07:10:44AM +1100, Erik de Castro Lopo wrote: Max Horn wrote: 2) Any chance for a new release? Just bug fixes would be enough. I'd like to do a release some time between now and xmas. That release would be something like what us in Xiph's git right now. Looking at the changes in the include directory since 1.2.1, it seems the C API was extended by one function and the C++ API by ~18 new methods. Are you planning to update the version-info? I see the C++ version-info was changed to 8:0:3, but I think it should actually be 9:0:3. The values in libFLAC{,++}/export.h might need an update too. Also, will it be called 1.2.2? I'd like to make some prerelease rpm packages from the current git snapshot. Thanks for all the work you have put into this. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [Flac-dev] Git branch with compiling fixes for win32
On Fri, May 04, 2012 at 06:09:06PM +0200, Miroslav Lichvar wrote: On Fri, May 04, 2012 at 05:53:23PM +0200, Miroslav Lichvar wrote: The most interesting part of the patch is the rewrite of the FLAC__bitreader_read_rice_signed_block function, which in the git repo seems to have only couple lines changed since 1.2.1. Here is that part of the patch rebased against current git. In a quick test it gives a 10% speedup in decoding. Will be this patch included in the upcoming release? Or does it need more work before it can be accepted? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 2/3] Add new clz function which works with input 0.
--- src/libFLAC/include/private/bitmath.h |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/src/libFLAC/include/private/bitmath.h b/src/libFLAC/include/private/bitmath.h index d32b1a7..4e60f78 100644 --- a/src/libFLAC/include/private/bitmath.h +++ b/src/libFLAC/include/private/bitmath.h @@ -64,7 +64,6 @@ static inline unsigned int FLAC__clz_soft_uint32(unsigned int word) }; return (word) 0xff ? byte_to_unary_table[(word) 24] : -!(word) ? 32 : (word) 0x ? byte_to_unary_table[(word) 16] + 8 : (word) 0xff ? byte_to_unary_table[(word) 8] + 16 : byte_to_unary_table[(word)] + 24; @@ -88,6 +87,14 @@ static inline unsigned int FLAC__clz_uint32(FLAC__uint32 v) #endif } +/* This one works with input 0 */ +static inline unsigned int FLAC__clz2_uint32(FLAC__uint32 v) +{ +if (!v) +return 32; +return FLAC__clz_uint32(v); +} + /* An example of what FLAC__bitmath_ilog2() computes: * * ilog2( 0) = undefined -- 1.7.7.6 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 1/3] Make FLAC__clz_soft_uint32 static.
--- src/libFLAC/include/private/bitmath.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/libFLAC/include/private/bitmath.h b/src/libFLAC/include/private/bitmath.h index 61b0e03..d32b1a7 100644 --- a/src/libFLAC/include/private/bitmath.h +++ b/src/libFLAC/include/private/bitmath.h @@ -42,7 +42,7 @@ #endif /* Will never be emitted for MSVC, GCC, Intel compilers */ -inline unsigned int FLAC__clz_soft_uint32(unsigned int word) +static inline unsigned int FLAC__clz_soft_uint32(unsigned int word) { static const unsigned char byte_to_unary_table[] = { 8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, -- 1.7.7.6 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[flac-dev] [PATCH 3/3] Optimize FLAC__bitreader_read_rice_signed_block.
--- src/libFLAC/bitreader.c | 445 +++ 1 files changed, 105 insertions(+), 340 deletions(-) diff --git a/src/libFLAC/bitreader.c b/src/libFLAC/bitreader.c index 2660c42..099b703 100644 --- a/src/libFLAC/bitreader.c +++ b/src/libFLAC/bitreader.c @@ -711,379 +711,144 @@ FLAC__bool FLAC__bitreader_read_rice_signed(FLAC__BitReader *br, int *val, unsig } /* this is by far the most heavily used reader call. it ain't pretty but it's fast */ -/* a lot of the logic is copied, then adapted, from FLAC__bitreader_read_unary_unsigned() and FLAC__bitreader_read_raw_uint32() */ FLAC__bool FLAC__bitreader_read_rice_signed_block(FLAC__BitReader *br, int vals[], unsigned nvals, unsigned parameter) -/* OPT: possibly faster version for use with MSVC */ -#ifdef _MSC_VER { - unsigned i; - unsigned uval = 0; - unsigned bits; /* the # of binary LSBs left to read to finish a rice codeword */ - /* try and get br-consumed_words and br-consumed_bits into register; * must remember to flush them back to *br before calling other -* bitwriter functions that use them, and before returning */ - register unsigned cwords; - register unsigned cbits; +* bitreader functions that use them, and before returning */ + unsigned cwords, words, lsbs, msbs, x, y; + unsigned ucbits; /* keep track of the number of unconsumed bits in word */ + uint32_t b; + int *val, *end; FLAC__ASSERT(0 != br); FLAC__ASSERT(0 != br-buffer); /* WATCHOUT: code does not work with 32bit words; we can make things much faster with this assertion */ FLAC__ASSERT(FLAC__BITS_PER_WORD = 32); FLAC__ASSERT(parameter 32); - /* the above two asserts also guarantee that the binary part never straddles more that 2 words, so we don't have to loop to read it */ - - if(nvals == 0) - return true; - - cbits = br-consumed_bits; - cwords = br-consumed_words; + /* the above two asserts also guarantee that the binary part never straddles more than 2 words, so we don't have to loop to read it */ - while(1) { + val = vals; + end = vals + nvals; - /* read unary part */ - while(1) { - while(cwords br-words) { /* if we've not consumed up to a partial tail word... */ - uint32_t b = br-buffer[cwords] cbits; - if(b) { -#if 0 /* slower, probably due to bad register allocation... */ defined FLAC__CPU_IA32 !defined FLAC__NO_ASM FLAC__BITS_PER_WORD == 32 - __asm { - bsr eax, b - not eax - and eax, 31 - mov i, eax - } -#else - i = FLAC__clz_uint32(b); -#endif - uval += i; - bits = parameter; - i++; - cbits += i; - if(cbits == FLAC__BITS_PER_WORD) { - crc16_update_word_(br, br-buffer[cwords]); - cwords++; - cbits = 0; - } - goto break1; - } - else { - uval += FLAC__BITS_PER_WORD - cbits; - crc16_update_word_(br, br-buffer[cwords]); - cwords++; - cbits = 0; - /* didn't find stop bit yet, have to keep going... */ - } - } - /* at this point we've eaten up all the whole words; have to try -* reading through any tail bytes before calling the read callback. -* this is a repeat of the above logic adjusted for the fact we -* don't have a whole word. note though if the client is feeding -* us data a byte at a time (unlikely), br-consumed_bits may not -* be zero. -*/ - if(br-bytes*8 cbits) { - const unsigned end = br-bytes * 8; - uint32_t b = (br-buffer[cwords] (FLAC__WORD_ALL_ONES (FLAC__BITS_PER_WORD-end))) cbits; - if(b) { - i =
[flac-dev] [PATCH] Optimize COUNT_ZERO_MSBS macro
Reorder the conditions according to the expected distribution of input signal. This seems to make it almost as fast as the clz builtin using the bsr instruction. --- src/libFLAC/bitreader.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/libFLAC/bitreader.c b/src/libFLAC/bitreader.c index 7ae086d..3d947af 100644 --- a/src/libFLAC/bitreader.c +++ b/src/libFLAC/bitreader.c @@ -68,9 +68,11 @@ COUNT_ZERO_MSBS (uint32_t word) #else /* counts the # of zero MSBs in a word */ #define COUNT_ZERO_MSBS(word) ( \ - (word) = 0x ? \ - ( (word) = 0xff? byte_to_unary_table[word] + 24 : byte_to_unary_table[(word) 8] + 16 ) : \ - ( (word) = 0xff? byte_to_unary_table[word 16] + 8 : byte_to_unary_table[(word) 24] ) \ + (word) 0xff ? byte_to_unary_table[(word) 24] : \ + !(word) ? 32 : \ + (word) 0x ? byte_to_unary_table[(word) 16] + 8 : \ + (word) 0xff ? byte_to_unary_table[(word) 8] + 16 : \ + byte_to_unary_table[(word)] + 24 \ ) #endif -- 1.7.7.6 ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [Flac-dev] Git branch with compiling fixes for win32
On Fri, May 04, 2012 at 11:22:00AM -0400, Cristian Rodríguez wrote: El 03/05/12 12:19, Miroslav Lichvar escribió: It makes the C function faster than the corresponding asm routine, so if it's included I'd suggest to just drop the asm function to not keep around more asm code than is necessary. With current compilers it is very likely that those routines are already superflous. It seems the current compilers are not that good yet :). In a test on a Core 2 machine with gcc-4.6.3, i686 flac build with nasm enabled is about 7% faster in decoding than without nasm. x86_64 build is about 2% faster than the i686 build with nasm enabled. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [Flac-dev] Git branch with compiling fixes for win32
On Mon, May 07, 2012 at 09:19:52PM +0400, LRN wrote: In a test on a Core 2 machine with gcc-4.6.3, i686 flac build with nasm enabled is about 7% faster in decoding than without nasm. x86_64 build is about 2% faster than the i686 build with nasm enabled. Was that with -O2 or -O3? Not sure, I didn't set CFLAGS which seems to result in using both: -O3 -funroll-loops -Wall -W -Winline -g -O2 I think funroll-loops can be harmful, the Fedora package is compiled with -O2 and funroll-loops is used only for the stream_encoder file and it seems decoding is a tiny bit faster than with the git version. Unfortunately I don't recall the details. On Mon, May 07, 2012 at 01:23:03PM -0400, Cristian Rodríguez wrote: Did you build with -fprofile-generate ... then make check .. then rebuild with -fprofile-use ? No, I have not tried that. -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [flac-dev] [Flac-dev] Git branch with compiling fixes for win32
On Fri, May 04, 2012 at 11:13:05AM -0400, Cristian Rodríguez wrote: Both Erick and I did already submitted patches to the tree that do just exactly what your flac-1.2.1-bitreader.patch intended.. please checkout current GIT tree. The most interesting part of the patch is the rewrite of the FLAC__bitreader_read_rice_signed_block function, which in the git repo seems to have only couple lines changed since 1.2.1. BTW, how much faster is the code with the clz builtin? If the architecture doesn't have the instruction, will be the gcc code as fast as the original code? -- Miroslav Lichvar ___ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [Flac-dev] bitreader optimizations
On Mon, Mar 17, 2008 at 06:55:01PM +0100, Miroslav Lichvar wrote: attached are patches that improve decoding speed a bit. The first patch improves the bit scan macro used for decoding unary values, the second one adds a GCC inline assembly for bswap and the third patch replaces the read_rice_block function. The third patch has a bug causing reading past input buffer, attaching a fixed version. Any chance on including these patches? The speedup on non-i386 archs is quite significant and Fedora packages have the patches included for more than 9 months, without any bug reports. -- Miroslav Lichvar ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[Flac-dev] CVS flac crashing on 64bit
flac from CVS crashes on 64bit architectures, both encoding and decoding fails. This patch fixes the problem, but more casting is required to remove compiler warnings. --- src/libFLAC/lpc.c 4 Apr 2007 00:59:28 - 1.61 +++ src/libFLAC/lpc.c 27 May 2007 16:14:01 - @@ -303,7 +303,7 @@ } #else /* fully unrolled version for normal use */ { - unsigned i; + int i; FLAC__int32 sum; FLAC__ASSERT(order 0); @@ -833,7 +833,7 @@ } #else /* fully unrolled version for normal use */ { - unsigned i; + int i; FLAC__int32 sum; FLAC__ASSERT(order 0); -- Miroslav Lichvar ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
[Flac-dev] negative LPC shift
From the FLAC format description: 5 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement). What happens when the shift is negative? Problem is that libFLAC decoder (and possibly all other decoders) uses always '' operator and this operator shifts only to right. (int)x -1 is the same as (int)x 31. -- Miroslav Lichvar ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [Flac-dev] amd64 issue with flac-1.1.3 beta2
On Sun, Nov 05, 2006 at 11:17:03AM +1100, David Collett wrote: I think I have tracked it down to FLAC__bitbuffer_get_buffer in bitbuffer.c taking an unsigned * for bytes, but write_bitbuffer_ in stream_encoder.c is passing a size_t *. On my system it looks like unsigned is 4 bytes, where size_t is 8 bytes. Changing FLAC__bitbuffer_get_buffer to take a size_t * seems to fix it. But as I said, I only have 1 days amd64 experience, so I don't really know what I am doing! I also havent checked if anyone else is calling get_buffer with an unsigned * so maby this will break elseware now. Has anyone else tested on amd64? Yes, I can confirm it. There is also one missed replacement in FLAC__BitbufferReadCallback declaration. -- Miroslav Lichvar ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [Flac-dev] better seeking
On Mon, Oct 30, 2006 at 11:13:25AM -0800, Josh Coalson wrote: my apologies for not doing this before Miroslav... I will definitely integrate it this time. Thanks. Sending latest version of the patch. Now it can seek in files that have large id3 tag (or any random data) at the end and it won't loop on streams with shuffled frames. -- Miroslav Lichvar Index: src/libFLAC/stream_decoder.c === RCS file: /cvsroot/flac/flac/src/libFLAC/stream_decoder.c,v retrieving revision 1.117 diff -u -r1.117 stream_decoder.c --- src/libFLAC/stream_decoder.c16 Oct 2006 15:49:18 - 1.117 +++ src/libFLAC/stream_decoder.c3 Nov 2006 08:32:35 - @@ -2923,23 +2923,23 @@ FLAC__bool seek_to_absolute_sample_(FLAC__StreamDecoder *decoder, FLAC__uint64 stream_length, FLAC__uint64 target_sample) { - FLAC__uint64 first_frame_offset = decoder-private_-first_frame_offset, lower_bound, upper_bound; - FLAC__int64 pos = -1, last_pos = -1; - int i, lower_seek_point = -1, upper_seek_point = -1; + FLAC__uint64 first_frame_offset = decoder-private_-first_frame_offset, lower_bound, upper_bound, lower_bound_sample, upper_bound_sample, this_frame_sample; + FLAC__int64 pos = -1; + int i; unsigned approx_bytes_per_frame; - FLAC__uint64 last_frame_sample = FLAC__U64L(0x); - FLAC__bool needs_seek; + FLAC__bool needs_seek = true, first_seek = true; const FLAC__uint64 total_samples = FLAC__stream_decoder_get_total_samples(decoder); const unsigned min_blocksize = decoder-private_-stream_info.data.stream_info.min_blocksize; const unsigned max_blocksize = decoder-private_-stream_info.data.stream_info.max_blocksize; const unsigned max_framesize = decoder-private_-stream_info.data.stream_info.max_framesize; - const unsigned channels = FLAC__stream_decoder_get_channels(decoder); - const unsigned bps = FLAC__stream_decoder_get_bits_per_sample(decoder); + const unsigned min_framesize = decoder-private_-stream_info.data.stream_info.min_framesize; + const unsigned channels = decoder-private_-stream_info.data.stream_info.channels; + const unsigned bps = decoder-private_-stream_info.data.stream_info.bits_per_sample; const FLAC__StreamMetadata_SeekTable *seek_table = decoder-private_-has_seek_table? decoder-private_-seek_table.data.seek_table : 0; - /* we are just guessing here, but we want to guess high, not low */ + /* we are just guessing here */ if(max_framesize 0) - approx_bytes_per_frame = max_framesize; + approx_bytes_per_frame = (max_framesize + min_framesize) / 2 + 1; /* * Check if it's a known fixed-blocksize stream. Note that though @@ -2958,15 +2958,12 @@ * First, we set an upper and lower bound on where in the * stream we will search. For now we assume the worst case * scenario, which is our best guess at the beginning of -* the first and last frames. +* the first frame and end of the stream. */ lower_bound = first_frame_offset; - - /* calc the upper_bound, beyond which we never want to seek */ - if(max_framesize 0) - upper_bound = stream_length - (max_framesize + 128 + 2); /* 128 for a possible ID3V1 tag, 2 for indexing differences */ - else - upper_bound = stream_length - ((channels * bps * FLAC__MAX_BLOCK_SIZE) / 8 + 128 + 2); + lower_bound_sample = 0; + upper_bound = stream_length; + upper_bound_sample = total_samples 0 ? total_samples : target_sample; /* * Now we refine the bounds if we have a seektable with @@ -2981,7 +2978,7 @@ } if(i = 0) { /* i.e. we found a suitable seek point... */ lower_bound = first_frame_offset + seek_table-points[i].stream_offset; - lower_seek_point = i; + lower_bound_sample = seek_table-points[i].sample_number; } /* find the closest seek point target_sample, if it exists */ @@ -2991,98 +2988,36 @@ } if(i (int)seek_table-num_points) { /* i.e. we found a suitable seek point... */ upper_bound = first_frame_offset + seek_table-points[i].stream_offset; - upper_seek_point = i; + upper_bound_sample = seek_table-points[i].sample_number; } } + decoder-private_-target_sample = target_sample; - /* -* Now guess at where within those bounds our target -* sample will be. -*/ - if(seek_table lower_seek_point = 0) { - /* first see if our sample is within a few frames of the lower seekpoint */ - if(seek_table-points[lower_seek_point
Re: [Flac-dev] better seeking
On Fri, Nov 03, 2006 at 10:01:42AM +0100, Miroslav Lichvar wrote: Thanks. Sending latest version of the patch. Now it can seek in files that have large id3 tag (or any random data) at the end and it won't loop on streams with shuffled frames. One small patch on top of that and it's bug free. ;) -- Miroslav Lichvar --- flac/src/libFLAC/stream_decoder.c.orig 2006-11-03 18:52:38.104291323 +0100 +++ flac/src/libFLAC/stream_decoder.c 2006-11-03 18:53:41.350727144 +0100 @@ -2995,7 +2995,7 @@ while(1) { /* check if the bounds are still ok */ - if (lower_bound_sample + FLAC__MIN_BLOCK_SIZE upper_bound_sample || lower_bound upper_bound) { + if (lower_bound_sample = upper_bound_sample || lower_bound upper_bound) { decoder-protected_-state = FLAC__STREAM_DECODER_SEEK_ERROR; return false; } ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [Flac-dev] PATCH for seek bug (#1154585)
On Sat, Oct 28, 2006 at 12:10:41AM -0400, Aaron Graham wrote: I had a problem where the seekable_stream_decoder was getting stuck in an infinite loop sometimes. It looks like I'm not the only one that's had the problem: http://lists.xiph.org/pipermail/flac-dev/2004-February/001508.html see also sourceforge bug #1154585 This bug is fixed in CVS already. -- Miroslav Lichvar ___ Flac-dev mailing list Flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev
Re: [Flac-dev] better seeking
Ok, the patch from 2003 about improving seeking still didn't make it to CVS, so here is another try. I made some benchmarking with the test_seeking utility from flac sources to show how bad the current seeking is, especially without seektable. Track used for the experiment had about 50 minutes. In the following table is average number of seeks and number of decoded frames required for one complete absolute seek. I tried several files encoded with fixed/variable block size, seektable with seekpoints every 10s, 100s and no seektable. unpatched patched block size seekdecoded decoded points seeks frames seeks frames fixed 10s 1.222.041.161.83 100s6.3712.26 2.223.21 - 16.80 341.61 3.845.23 variable 10s 1.182.931.422.57 100s8.8016.65 2.583.67 - 31.35 445.08 4.285.67 And another table shows the worst seek for the same files. block size seekdecoded decoded points seeks frames seeks frames fixed 10s 49 49 3 6 100s55 55 5 8 - 1 10386 9 variable 10s 53 53 9 9 100s87 9 8 8 - 1 162010 13 Attached is the new patch. -- Miroslav Lichvar Index: stream_decoder.c === RCS file: /cvsroot/flac/flac/src/libFLAC/stream_decoder.c,v retrieving revision 1.117 diff -u -r1.117 stream_decoder.c --- stream_decoder.c16 Oct 2006 15:49:18 - 1.117 +++ stream_decoder.c28 Oct 2006 17:12:19 - @@ -2923,18 +2923,18 @@ FLAC__bool seek_to_absolute_sample_(FLAC__StreamDecoder *decoder, FLAC__uint64 stream_length, FLAC__uint64 target_sample) { - FLAC__uint64 first_frame_offset = decoder-private_-first_frame_offset, lower_bound, upper_bound; + FLAC__uint64 first_frame_offset = decoder-private_-first_frame_offset, lower_bound, upper_bound, lower_bound_sample, upper_bound_sample; FLAC__int64 pos = -1, last_pos = -1; - int i, lower_seek_point = -1, upper_seek_point = -1; + int i; unsigned approx_bytes_per_frame; - FLAC__uint64 last_frame_sample = FLAC__U64L(0x); + FLAC__uint64 last_frame_sample = FLAC__U64L(0x), this_frame_sample; FLAC__bool needs_seek; const FLAC__uint64 total_samples = FLAC__stream_decoder_get_total_samples(decoder); const unsigned min_blocksize = decoder-private_-stream_info.data.stream_info.min_blocksize; const unsigned max_blocksize = decoder-private_-stream_info.data.stream_info.max_blocksize; const unsigned max_framesize = decoder-private_-stream_info.data.stream_info.max_framesize; - const unsigned channels = FLAC__stream_decoder_get_channels(decoder); - const unsigned bps = FLAC__stream_decoder_get_bits_per_sample(decoder); + const unsigned channels = decoder-private_-stream_info.data.stream_info.channels; + const unsigned bps = decoder-private_-stream_info.data.stream_info.bits_per_sample; const FLAC__StreamMetadata_SeekTable *seek_table = decoder-private_-has_seek_table? decoder-private_-seek_table.data.seek_table : 0; /* we are just guessing here, but we want to guess high, not low */ @@ -2961,12 +2961,14 @@ * the first and last frames. */ lower_bound = first_frame_offset; + lower_bound_sample = 0; /* calc the upper_bound, beyond which we never want to seek */ if(max_framesize 0) upper_bound = stream_length - (max_framesize + 128 + 2); /* 128 for a possible ID3V1 tag, 2 for indexing differences */ else upper_bound = stream_length - ((channels * bps * FLAC__MAX_BLOCK_SIZE) / 8 + 128 + 2); + upper_bound_sample = total_samples 0 ? total_samples : target_sample; /* * Now we refine the bounds if we have a seektable with @@ -2981,7 +2983,7 @@ } if(i = 0) { /* i.e. we found a suitable seek point... */ lower_bound = first_frame_offset + seek_table-points[i].stream_offset; - lower_seek_point = i; + lower_bound_sample = seek_table-points[i].sample_number; } /* find the closest seek point target_sample, if it exists */ @@ -2991,98 +2993,32 @@ } if(i (int)seek_table-num_points) { /* i.e. we found a suitable seek